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Abstract 



The topic of this thesis is the numerical simulation of quantum chromodynamics including dynamical 
fcrmions. Two major problems of most simulation algorithms that deal with dynamical fcrmions are 
(i) their restriction to only two mass-degenerate quarks, and (ii) their limitation to relatively heavy 
masses. Realistic simulations of quantum chromodynamics, however, require the inclusion of three light 
dynamical fcrmion flavors. It is therefore highly important to develop algorithms which are efficient in 
this situation. 

This thesis is focused on the implementation and the application of a novel kind of algorithm which 
is expected to overcome the limitations of older schemes. This new algorithm is named Multiboson 
Method. It allows to simulate an arbitrary number of dynamical fcrmion flavors, which can in principle 
have different masses. It will be shown that it exhibits better scaling properties for light fcrmions than 
other methods. Therefore, it has the potential to become the method of choice. 

An explorative investigation of the parameter space of quantum chromodynamics with three flavors 
finishes this work. The results may serve as a starting point for future realistic simulations. 
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1 Introduction 



The works of the LORD are great, 
sought out by all them 
that have pleasure in them. 
Psalm 111, Verse 2. 

Throughout history one of the fundamental driving forces of man has been the desire to understand 
nature. In the past few centuries, natural sciences have paved the way to several revolutionary insights 
to the structures underlying our world. Some of them founded fundamental benefits for the quality of 
life and the advance of civilization. In particular, in the recent decades, computer technology has set 
the base for a major leap of several important facets of human existence. 

A major challenge is posed by fundamental research, which does not directly aim towards developing 
industrial applications, but instead examines structures and relations relevant for future technologies. 
The goal of fundamental research is to formulate theories which comprise as many different phenomena 
as possible and which are, at the same time, as simple as they can be. 

The branch of natural sciences which concentrates on the structures underlying matter and energy is 
termed particle physics. This field lives on contributions from experiments, providing insights of how 
the particles which constitute our world interact. A further driving force behind particle physics is 
the desire to find a simple description of the mechanisms underlying these experiments. The field of 
particle physics has benefitted from the evolution in computer science, but on the other hand theoretical 
physicists have triggered many pivotal developments in technology we have today. 
Current physical theories categorize the interactions between matter and energy into four different 
types of fundamental forces. These forces are gravitation, electromagnctism, and finally the weak and 
the strong interactions. The strong interaction is responsible for the forces acting between hadrons, 
i.e. between neutrons, protons and nuclei built up from these particles. Its name originates from the 
fact that it is the strongest among the other forces on the energy scale of hadronic interactions. The 
strengths of the four forces can be stated in terms of their coupling constants [Q: 

1/137.035 999 76(50), 
0.1185(20) , 

1.166 39(1) x 10~ 5 GcV~ 2 , 

6.707(10) x 10~ 39 GeV~ 2 . (1.1) 

It is apparent, that gravitation is many orders of magnitude weaker than the other forces and thus is 
expected to play no role on the energy scales important for particle physics so far [2]. 
In contrast to electromagnetic interactions and gravity, strong and weak forces do not have infinite 
ranges. They exhibit finite ranges up to at most the size of a nucleus. Thus, they only play a role in 
nuclear interactions, but are almost completely negligible on the level of atoms and molecules. Any 
particular process can be dissected into a multitude of processes acting at a smaller scale. Hence, a 
process with a given action ^Process} may be described by a set of subprocesses J2 i S^Subprocess^} 
with smaller actions S'jSubproccss.J < ^Process}. We can tell from observations that processes are 
essentially deterministic if the action of the process in question S{ Process} is far larger than some 
action h which is known as "Planck's constant" [Sj- However, if the action of the process is of the order 
of fi, then the system cannot be described by a deterministic theory any longer and a non-deterministic 
theory known as quantum mechanics must be employed. Today, all interactions can be described within 
a quantum-mechanical framework, with the exception of gravity, which has so far not been successfully 
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formulated as a consistent quantum theory. For a discussion of these topics the reader may consult 0] 
and references therein. 

A necessary requirement for the above iteration to make sense is that it must be possible to recover 
a classical (non-probabilistic) theory in a certain limit (naturally the h — > limit) from a quantum 
theory. This limit is given by the WKB approximation 0. One finds that an expansion exists, where 
the amplitude of a process can be formulated as a power-series in h: 

^{Process} = A{ Classical} + KA X + K 2 A 2 H . (1.2) 

In the limit h — ► the classical amplitude is recovered. But Eq. (|1.2|) also shows that the quantum 
theory contains more information than the classical theory: Different choices for the series coefficients 
{A\, A2, . . .} obviously lead to the same classical theory. Thus, if one intends to construct a quantum 
theory starting from the classical theory, there is always an ambiguity of how to proceed ■ The correct 
prescription can only be found by experimental means. 

There exist several quantization prescriptions to build a quantum theory All of these have in com- 
mon that they describe a system which exhibits a probabilistic interpretation of physical observables 
and a deterministic evolution equation of some underlying degrees of freedom. The quantum theory 
which is believed to constitute the correct quantum theory of the strong interaction is called quantum 
chromodynamics (QCD). 

In general, a quantum mechanical system can not be solved exactly, but only by use of certain ap- 
proximations. The approximation most commonly employed is known as perturbation theory and is 
applicable in a large variety of cases. It is known to fail, however, when applied to the low-energy 
regime of QCD, where a diversity of interesting phenomena occurs. Hence, different techniques com- 
monly called "non-perturbative" methods must be employed. One of these non-perturbativc methods 
is the simulation of the system in a large-scale numerical calculation, an approach which is referred to 
as "Lattice QCD" . This approach is in the focus of this thesis. 

When performing numerical simulations within Lattice QCD, one finds that the simulation of the 
bosonic constituents of QCD, the "gluons", stand at the basis of research efforts, see [H] for a pioneering 
publication. 

The inclusion of dynamical fermions poses a serious problem. Although it has become clear in [7| |Sj 
that without dynamical fermions the low-energy hadron spectrum is reproduced with 10% accuracy, 
several important aspects of low-energy QCD require the inclusion of dynamical fermions. One case 
where the inclusion of dynamical fermions is phenomenologically vital is given by the mass of the rj 
meson, cf. 

The numerical simulation of dynamical fermions is plagued by severe difficulties. In particular, the 
requirement that the fermions must be light is to be met, since only in this case the chiral behavior of 
QCD, i.e. the behavior at light fcrmion masses, is reproduced correctly But in this particular limit, 
the algorithms suffer from a phenomenon known as critical slowing down, i.e. a polynomial decrease in 
efficiency as the chiral point is approached. 

Furthermore, the majority of algorithms in use today can only treat two mass-degenerate dynamical 
fermion flavors, a situation not present in strong interactions as observed in nature. In fact, one has to 
use three dynamical fermion flavors |10j . 

Thus, the demands on an algorithm for the simulation of dynamical fermion flavors must consist of 
(i) the suitability for the simulation of three dynamical fcrmion flavors, and (ii) the efficiency of the 
algorithm with regard to critical slowing down. These two requirements are not met by the commonly 
used algorithm in Lattice QCD, the hybrid Monte-Carlo (HMC) algorithm. 

The topic of this thesis is the exploration of a new type of algorithm, known as the multiboson algorithm. 
This algorithm is expected to be superior to the HMC algorithm with regard to the above properties. In 
particular, it will be examined how to tune and optimize this class of algorithms and if these algorithms 
are suitable for the simulation of three light, dynamical, and mass-degenerate fcrmion flavors. 
The thesis is organized as follows: the theoretical background of the strong interaction, the quantization 
of field theories, and the definition of lattice gauge theories is given in chapter |2 The tools required 
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to perform numerical simulations in lattice theories and the analysis of time series are formulated in 

chapter |21 The optimization and tuning of the algorithm is discussed in chapter 0] 

A direct comparison of the multiboson algorithm with the hybrid Montc-Carlo method is performed in 

chaptcr[Sl In particular, the scaling of the algorithms with the quark mass has been focused at. 

A particularly useful application of the multiboson algorithm appears to be the simulation of QCD 

with three dynamical fcrmion flavors. Such a simulation allows to assess the suitability of multiboson 

algorithms for future simulations aimed at obtaining physically relevant results. A first, explorative 

investigation of the parameter space which might be relevant for future simulations is presented in 

chapter 

Finally the conclusions are summarized in chapter 

Appendix ^ contains a short overview of the notation used in this thesis. An introduction to group 
theory and the corresponding algebras is given in App. [5] The explicit expressions used for the local 
actions required for the implementation of multiboson algorithms are listed in App.lCl At last, App.IdI 
explains the concepts required for running large production runs, where a huge amount of data is 
typically generated. 

I have to thank many colleagues and friends who have accompanied me during the completion of 
this thesis and my scientific work. I am indebted to my parents, Astrid Borger, Claus Gebert, Ivan 
Hip, Boris Postler, and Zbygniew Sroczynski for the time they invested to proof-read my thesis. For 
the interesting scientific collaborations and many useful discussions I express my gratitude to Guido 
Arnold, Sabrina Casanova, Massimo D'Elia, Norbert Eicker, Federico Farchioni, Philippe de Forcrand, 
Christoph Gattringer, Rainer Jacob, Peter Kroll, Thomas Moschny, Hartmut Neff, Boris Orth, Pavel 
Pobylitza, Nicos Stefanis, and in particular to Istvan Montvay, Thomas Lippert, and Klaus Schilling. 
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2 Quantum Field Theories and Hadronic Physics 



This chapter provides a general introduction into the topic of particle physics. It covers both the 
phcnomenological aspects, the mathematical structures commonly used to describe these systems, and 
the particular methods to obtain results from the basic principles. 

Section 12.11 gives a general overview of the phenomenology of the strong interaction without making 
direct reference to a particular model. 

A short overview of classical (i.e. non-quantum) field theories is given in Sec. 12.21 With this basis, 
the general principles of constructing a quantum field theory starting from a classical field theory arc 
presented in Sec. 12. 31 The case of non-relativistic theories is covered in Scc. l2.3.fl while the generalization 
to relativistic quantum field theories requires far more effort. This is described in Sec. 12.3.21 where the 
basic axiomatic frameworks of relativistic quantum field theories are stated. 

Particular emphasis will be placed on the path integral quantization which allows for a rigorous and 
efficient construction of a quantum theory. Section [2 . 3 . 31 provides a detailed treatise of this method and 
also contains a discussion of how one can perform computations in practice. An important tool for the 
evaluation of path integrals is the concept of ensembles, which is introduced in Sec. 12.3.41 It will turn 
out to be essential in numerical simulations of quantum field theories. 

Section l2~4l introduces an important class of quantum field theories, namely the class of gauge theories. 
These models will be of central importance in the following. 

With all necessary tools prepared, Sec. 12.51 will introduce a gauge theory which is expected to be able 
to describe the whole phenomenology of the strong interaction. This theory is known as quantum 
chromodynamics (QCD) and it is the main scope of this thesis. After a general introduction to the 
properties of QCD in Sec. 12. 5. II a method known as factorization is discussed in Sec. 12. 5. 21 This method 
allows to combine information from the different energy scales and thus provides an essential tool for 
actual predictions in QCD calculations. Finally, the method of Lattice QCD is discussed in Sec. 12.5.31 
Lattice simulations exploit numerical integration schemes to gain information about the structure of 
QCD, and represent the major tool for the purposes of this thesis. 

The construction of a quantum field theory based on path integrals requires a certain discretization 
scheme. This scheme is particularly important in lattice simulations. Therefore, Sec. 12.61 covers the 
common discretizations for the different types of fields one encounters in quantum field theories. The 
case of scalar fields allows for a simple and efficient construction as will be shown in Sec. 12.6.11 The 
case of gauge fields is more involved since there exist several proposals how this implementation should 
be done. Contemporary simulations focus mainly on the Wilson discretization, although recently a new 
and probably superior method has been proposed. This method, known as D-theory, is reviewed in 
Sec. ECU 

The discretization of fermion fields is even more involved. The necessary conditions such a scheme has 
to fulfill are given in Sec. 12.6.41 and the scheme used in this thesis, namely the Wilson- fermion scheme, 
is constructed. In contrast to the cases of scalar and gauge fields, a large number of different fermion 
discretization schemes are used in actual simulations today, and each has its particular advantages and 
disadvantages. 

This chapter is concluded by the application of the previously discussed discretization schemes to a 
gauge theory containing both fermions and gauge fields in Sec. 12.6.51 Such a model is expected to be 
the lattice version of gauge theories with fermions, and in particular of QCD. 
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2 Quantum Field Theories and Hadronic Physics 



2.1 Phenomenology of Strong Interactions 

Until 1932 only the electron e, the photon 7 and the proton p have been known as elementary particles 
(for overviews of the history of particles physics see [111 1121 113) ). The only strong process known 
was the a-decay of a nucleus. The milestones in this period were the detection of the neutron by 
Chadwick in 1932 and the prediction of the 7r-meson (today it is customary to call it simply "pion") 
by Yukawa in 1935 as the mediator of the strong force. However, it took until 1948 before the charged 
pion was actually detected by Lattes. In 1947 particles carrying a new type of quantum number called 
"strangeness" have been detected by Rochester. In 1950 the neutral pion was detected by Carlson 
and Bjorkland. It was soon realized that the hadrons were not point-like objects like the leptons, but 
had an internal structure and accordingly a finite spatial extent. 

The experiments to observe the structure of hadrons usually consist of scattering two incoming particles 
off each other, producing several outgoing particles of possibly different type. If one considers a partic- 
ular subset of processes where all outgoing particles are of a determined type, one speaks of exclusive 
reactions. A sub-class of exclusive reactions are the elastic scattering processes, where the incoming 
and outgoing particles are identical. 

The inclusive reactions are obtained by summing over all possible exclusive reactions for given incoming 
particles. Inclusive electron-nucleon scattering at very large energies is called deep-inelastic scattering 
(DIS) and played an important role in the understanding of the structure of hadrons. The prediction 
of scaling by Bjorken in 1969 was confirmed experimentally and led to the insight that the hadrons 
consist of point-like sub-particles. In 1968 Feynman proposed a model which exhibited this feature, 
the parton model. 

A collection of hadrons, as known in the early 60's, is given in Tab. 12. ll together with their properties 
in the form of quantum numbers. These quantum numbers are known as spin, parity, electric charge 
Q, baryon number B, and strangeness S. They are conserved by the strong interaction 1 . The particles 
may be divided into several groups according to their spin and parity: the particles with even spin are 
called mesons and the particles with odd spin baryons. Because of their parity and spin, the particles 
7T°, 7r~^~ , K , K°, K°, r] and n are usually called "pseudoscalar mesons". Similarly, the particles p°, 
p^, oj, K*°, K*^ : K*° and tp are named "vector mesons". The group p, n, A, E°, T, ± , S~ and S° is 
simply called "baryons" . In each group, the members have roughly similar masses (with the exception 
of the rj in the group of the pseudoscalar mesons). 

In 1963 Gell-Mann and Zweig independently proposed a scheme to classify the known particles as 
multiplets of the Lie group SU(3)^ . It turned out that the classification is indeed possible with the 
exception that there were no particles corresponding to the fundamental triplets of the group. This 
would imply that the corresponding particles carry fractional charges; particles with such a property 
have never been seen in any experiment. However, experiments at SLAC in 1971 involving neutrino- 
nucleon scattering clearly indicated that the data could be accounted for if the parton inside the nucleon 
had the properties of the particles in the fundamental triplet of the SU(3)f group. This led finally to the 
identification of the (charged) partons from Fcynman's model with the particles from the classification 
scheme of Gell-Mann and Zweig. These particles are known today as quarks. 

Today a huge number of particles subject to the strong interaction is known from different kinds of 
experiments. For a complete overview see pp. To classify them the quark model had to be extended to 
six kinds of quarks. They are referred to as having "different flavors" , which are new quantum numbers. 
Thus, they are conserved by the strong interaction. The quarks cover a huge range of masses. Their 
properties are listed in Tab. 12.21 To classify the multiplets of the flavor group SU(3)f, two numbers are 
required which are usually called Y (the strong hyper-charge) and T3 (the isospin). They are defined 



Note, however, that the weak interaction violates both the baryon number and the strangeness. While the latter 
phenomenon has been observed in experiment so far 1111 . the former violation may never be observed directly in 
earth-bound experiments 1141 
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Tabic 2.1: List of selected hadrons with their quantum numbers and their masses in McV. 



by 

Y = S + B, 

and 

T 3 = Q - \ (S + B) . 

The three light quarks together with the corresponding anti-particles are shown in Fig. 12.11 The 
classifications for the pseudoscalar mesons, the vector mesons and the baryons are given in Figs. l2.2ll2.3l 
and 12.41 The multiplets containing the particles are irreducible representations of the SU(3)f group; 
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Table 2.2: Different flavors of quarks together with their associated quantum numbers. 
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Figure 2.1: Fundamental representations of the SU(3)f flavor group. The left graph shows the quark 
triplet (u,d,s) and the right graph shows the anti-quark triplet (u,d,s). 
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Figure 2.2: Pseudoscalar meson octet together with the singlet (the rf state) as classified by the param- 
eters of the SU(3) f group. 



they may be built from tensor products of the fundamental triplet in the following way: 
3®3 = 801, 

3®3®3 = 10080801. (2.1) 

This explains why there are always nine particles in each of the meson groups: the first eight belong 
to an octet and the remaining one is the singlet state. In the group of the pseudoscalar mesons, the 
singlet state rf deserves special attention because its mass is extremely heavy. 

The particle content to lowest order of the pseudoscalar mesons (Fig. 12. 2J) in terms of the different quark 
flavors is given by 

7r + = du , 7r~ = ud, K° — ds , K° = sd , K + = us, K~ = su, 



n 



—= (uu — dd) , rj = —= (uu + dd — 2ss) , rf = —= (uu + dd+ , 
\/2 v6 v2 

Similarly the lowest-order particle content of the vector mesons (Fig. 12.3(1 is given by 

p + = ud , p~ = du , Kq = ds , Kq = sd , K + * = us , K~* = su , 

p° = —j= {uu — dd) , uo = —= (uu + dd) , <fi = ss . (2-3) 
v 2 v 2 



(2.2) 
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Figure 2.3: Vector meson octet together with the singlet (the 4> state) as classified by the parameters 
of the SU(3) f group. 




Figure 2.4: Baryon octet as classified by the parameters of the SU(3)f group. 



A very important step was the discovery of the "color" degree of freedom. A first indication towards this 
feature was the observation that the A ++ baryon is a particle with flavor content of three it-type quarks 
in the ground state and spin pointing in the same direction. Without an additional quantum number 
this would imply that all quarks building up these particles are in the same quantum state which is not 
possible with Fermi-Dirac particles since their wave-functions should anti-commute. Consequently, a 
further quantum number should exist which indeed has been found in experiments This quantum 
number has been termed color and can take on three values. If this property is described again in terms 
of an SU(3) group, then the quarks must transform as the representation of the fundamental multiplct. 
The hadrons, however, are color singlets since they display no color charge. This and the observation 
that the quarks have never been seen as free particles outside of hadrons led to the hypothesis of 
confinement which will be examined more closely in Sec. 12.5.11 According to this hypothesis, free 
quarks could never be observed in nature directly since the force between them grows infinitely. 
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2.2 Classical Field Theories 

There are quite a few frameworks for the description of a classical 2 physical system. For a general 
review of such frameworks, the reader may consult [15j . for the generalization to field theories |16j . 
For the later generalization to quantum mechanical systems, the Lagrangian method will attract our 
attention. 

This approach has the advantage that it may be formulated in a coordinate-invariant manner, i.e. it 
does not depend on any specific geometry of the physical system. The basic postulate underlying the 
dynamics of this formulation is (see e.g. ^7] for textbook overview): 

Principle of extremal action: A set of classical fields {tpi(x)}, i = 1, . . . , N, is described by a local C 2 
function L (tpi(x), dtpi(x)) which is called the Lagrangian density. The integral £ over a region in 
Minkowski-space M 4 is defined as the action of the system: 

S[ip(x)} = / d A xL (tfii(x), dipi{x)) . 

The equations of motion follow from the requirement that the action becomes minimal. A neces- 
sary condition is 

5(fi 

From this postulate, we obtain a necessary condition which the functions fk(x) have to fulfill in order 
to describe the motion of the physical system 

SL d ( 5L 

S(pi dt \S (dip i ) / 

This set of equations is called the Euler- Lagrange equations. Using £, we can define the momentum 
conjugate 7Tj(x) of ifi(x) via 

= SL (ipi(x),d(pi(x)) 
6(d (pi{x)) 

Then the Hamiltonian H is defined by a Lcgendre-transformation 
H (tpi (x),TTi (x)) = J d 3 x(TTi(x)do<pi(x) - L(<pi(x),dtpi(x))) ■ 
From Eq. (|2.4(l the canonical equations of motion follow: 

SH ((pi(x),TTi (x)) 



= 0. (2.4) 



d (fii(x) = 



6wi(x) 



d Q Ti t {x) = — — . (2.5) 

dipi(x) 

Similarly to the situation in classical mechanics, the Lagrangian framework and the Hamiltonian frame- 
work provide the bases of two different formulations for the quantum mechanical description of a system. 
Since the particles of a quantum theory should transform as representations of multiplets of certain 
symmetry groups, the representations of the Poincare group (see App. IB.4jl will require special attention. 
The lowest representations are given by 

2 The meaning of the word "classical" in the title of this section and in the context of this paragraph is referring to any 
system described by a finite or infinite number of degrees of freedom with deterministic dynamics regardless of the 
symmetries of the underlying space-time or the system itself. It should be noted, that the term "classical" is perhaps 
the word with the largest variety of meanings in the literature of physics — it is used for non-relativistic, non-quantum 
systems, in a different context for relativistic quantum field theories and also for anything in between. 
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1. the singlet representation, described by a scalar field <f>(x) : x i— > 0(x) G M 4 . The field transforms 
as 

(A, a) : x i—* a/ =>• 4>{x) ► (^(a;') . 

2. the doublet representation, described by a Weyl spinor field £ a , transforming as the fundamental 
representation under C\_ . In this thesis we deal with Dirac spinors composed of two Weyl spinor 
fields 

4> = ttx) • 

3. the vector representation, described by a four- vector field A^(x), transforming as 

A „ = k ^ A v 

The Lagrangians describing the corresponding particles should obey Lorentz- and CPT-invariance and 
possibly transform according to an internal symmetry group. As an excellent introduction see |18j . So 
far these requirements fix at least the free field Lagrangians. Examples for Lagrangians obeying these 
principles arc listed in the following: 

1. The Lagrangian for a complex scalar field <p{x) is given by 

L(cf>) = (d^)(d^)+m 2 ^<p + V(^,^), (2. 6 ) 

where the free field (i.e. the Lagrangian describing a field propagating without an external force) 
is given by V{<j>\ (j)) = 0. 

2. The spin-statistics theorem (see Add. lB~5ll suggests that a quantum mechanical spin- 1/2 particle 
should be described by anticommuting field variables. Thus, the fields should be Grassmann 
variables (cf. Appendix IB. 6(1 . The free field Lagrangian is given by 

£ (^) = $(ip- m )ip. (2.7) 

A generalization is the free A^-component Yang-Mills field described by an A-component vector 
= C01 ) • • • j "0Jv) °f independent fields {tpi}. Its Lagrangian is the sum of the single-field 
Lagrangians and thus given by 

H(* N ,* N )=* N (ip-m)* N . (2.8) 

3. The vector field with a field strength F^ v = d^A" — d v A^ is described in the non-interacting 
case by 

H{A ll ) = -\F^F^ v . (2.9) 
The generalization to a vector field with several components will be discussed in Sec. 12.41 



2.3 Quantization 

As it has been pointed out in Chapter ^ the world of elementary particle physics is essentially of 
quantum mechanical nature. However, any theory of particle physics should also obey the principle of 
Lorentz invariancc. Therefore the need arises to find a quantum mechanical model which is at least 
globally invariant under the Poincare-group. This is hard to do within the framework of quantum 
mechanics for pointlike-particles. In fact, the single-particle interpretation of the relativistic Dirac 
equation is subjected to several paradoxes (see e.g. ^HI) which can only be resolved if one considers 
instead fields (or rather generalized concepts called operator-valued distributions) |18j . 
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2.3.1 Non-relativistic Quantum Mechanics 

Before embarking on the definition of a relativistic quantum field theory, we should recall the concepts 
of a non-relativistic quantum theory. In general, a quantum theory has the following general structure 

HUE]: 

Hilbert space 7i: The discussion will now be limited to pure states, which are given by unit rays of a 
complex Hilbert space Tt with scalar product (-|-). 

Observables: An operator A on 7i is called an observable if it is a self-adjoint operator on TC. Thus, its 
eigenvalues are real. Observables correspond to quantities which can be measured in an experi- 
ment. If the system is in a state \ip) £ Ji, the expectation value (A) of the observable A is given 
by 

(A) = ($\Ai>) . 

Symmetries: The symmetries of the system are represented by unitary (or anti-unitary) operators on 
H. 

Evolution: The evolution equation of the states, the Schrddinger equation, is given by 

%>> = -i^b/>>- (2-10) 

The Hamiltonian operator H is a Hcrmitian operator acting in TL. 

The state space {IV^)} may be finite, infinite or even uncountably infinite. In the case that it is finite, 
the solution of Eq. 1)2. 10|) is well-defined and can be found by diagonalizing the operator H. 
But already in the case of an infinite but countable state space, there may be physically equivalent 
observables which are not unitarily equivalent [191 I22| . However, we will see below that information 
about the quantum field theory can be extracted even without knowing the complete state space. 
In fact, only few cases are known, where the state space has been constructed in a mathematically 
rigorous manner. So far, this does not include any interacting quantum field theory in four space-time 
dimensions. 



2.3.2 The Axioms of Relativistic Quantum Field Theories 

The mathematically rigorous formulation of relativistic quantum field theories requires the introduction 
of operator-like objects replacing the classical fields; however, it turns out to be impossible to use 
operator-valued functions to define <fi(x) since the relativistic quantum field is too singular at short 
distances |21j . Rather, a quantum field must be defined as an operator-valued distribution <fi [f(x)]; the 
only objects with physical meaning are thus given by the smeared fields 4>[f], where f(x) is a smooth 
test function in the Schwartz space iS (M 4 ) and 

<f>if) = J dx4>[f{x)\ . (2.11) 

The fact that the 0-opcrators only get a meaning in conjunction with the functions f(x) already 
displays the need to regularize a quantum field theory — a requirement that must be implemented by 
all methods striving to compute observables. With this notation, a relativistic quantum field theory 
can be postulated using the Gdrding- Wightman axioms |21j (where the discussion is now limited to the 
case of a single scalar field in four dimensions): 

States: The states of the system are the unit rays of a separable Hilbert space TL. There is a distin- 
guished state |fi), called the vacuum. 
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Fields: There exists a dense subspace D C H, and for each test function / in S (R 4 ) there exists an 
operator 4>[f\ with domain D, such that: 

1. The map / i— > (^i\<f>[f\^2) is a tempered distribution V \ip2)) G D. 

2. For all f(x) G R, the operator is Hermitian. 

3. The vacuum belongs to D. 

4. </>[/] leaves D invariant: given an arbitrary G D implies that </>[/] IV) G D. 

5. The set Do of finite linear combinations of vectors of the form <f>[fi] ■ ■ ■ 4>[f n ] \ty with n > 
and /i, . . . , f n G S (R 4 ) is dense in H. 

Relativistic covariance: There is a continuous unitary representation U (a, A) of the proper orthochro- 
neous Poincare group V ! such that 

1. |V) £ implies [/ (a, A) |V) € D. 

2. £/(a,A)|fi) = V(a,A) G V\. 

3. (7(a,A)0[/(x)](7(a,A)- 1 -^[/(A-Hx-a))]. 

Spectral condition: The joint spectrum of the infinitesimal generators of the translation subgroup 
U (a, 1) is contained in the forward light cone 

V + = {p=(p°,p)gRV>|p|}. 

Locality: If / and g have spacelike-separated supports, then </>[/] and <f>\g] commute: 

W]4>[g]-4>W[f])\^) =o v^)6i>. 

The quantities of major interest are the vacuum expectation values of products of field operators 
2£?i (/lj ■ ■ ■ , /«)• These objects are called Wightman distributions: 

W n (/i, . . . , /„) = (J2|0 [/i] ... [/„] fi) . (2.12) 

It can be shown |23| using the so-called "reconstruction theorem" that all information of the quantum 
theory can be obtained from these vacuum expectation values. Essentially it allows to construct the 
state space as well as the field operators. Since the 2U„ are numerical-valued quantities, they are 
much easier to work with than the operator-valued fields <j). This allows for a simpler treatment of the 
problem. Assuming that some smearing functions {fi(x)}, 1 < i < n, peaked around the points {xi} 
have been chosen, one can speak of Wightman functions and introduce the more convenient notation 

•m n (xi,...,a; n ) = 52n n (/i(a:),...,/ n (a!)) . (2.13) 

A very powerful observation is the non-trivial fact that the Wightman functions may be analytically 
continued from the Minkowski space M 4 to Euclidean space R 4 . This can be done by applying the 
following transformation of a four-vector x in Minkowski-space 

x i— * x : x = (a; ,x) = (— iaio,x) • (2-14) 

The Minkowski- metric g^ v is changed to 5^. This transformation is also known as the Wick rotation. 
This allows for the definition of the Schwinger functions 3 

6 n (x 1 ,...,x n ) = W n (x' 1 ,...,x' n ) . (2.15) 

As discussed in [2H this analytic continuation is possible in the whole complex plane, i.e. the Schwinger 
distributions exist if all x% are distinct. It can be shown using the Garding- Wightman axioms |21j that 
the & n (xi, . . . , x n ) have the following properties: 

3 Again one should not forget that the objects under consideration are in fact distributions. 
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Reflection positivity: Let Ox = 9 (x°,x) = (— cc°,x) denote reflection on the real axis. The © satisfy 
the condition 

6„ (9xi, ■ ■ • , 9x n ) = 6* (xi, ...,x n ) . 

Euclidean invariance: The 6 n are invariant under all Euclidean transformations (a,R) <G SO (4): 

&„ (Rxi + a, . . . , Rx n + a) = 6„ (xi, ...,x n ) . 

Positive definiteness: Define the composition of test functions fa G S (R 4n ) . i — 0, . . . , n, by (with 
k + I < n) 

(fk ® fi) (xi, ■ ■ -,Xk+l) = fk (xi, ...,X k )fi (Xk+1, ■ ■ -,X k +l) ■ 

The & n then obey the following condition: 

n 

53 & k+i (fk (9x n ,...,0xi))® fi(xi,...,x() >0. 

fe,Z=0 

Permutation symmetry: The ©„ are symmetric in their arguments. 

Now there is another important theorem due to Osterwalder and Schrader: given the Schwingcr 
distributions (|2.15(1 satisfying the above conditions, one can reconstruct the whole quantum field theory 
in Minkowski space [251 1261 127| . So the axioms due to Garding-Wightman and Osterwalder-Schrader 
are equivalent and one can use the Osterwalder-Schrader framework to actually define a relativistic 
quantum field theory. 

Consequently, it is sufficient to compute the Schwinger functions for a Euclidean quantum field theory 
and then reconstruct the Minkowski theory from them. As it will be discussed later on, the Schwingcr 
functions are easier to handle than the Wightman distributions. However, this has to be taken with a 
grain of salt: physical observables calculated in the Euclidean theory must afterwards be analytically 
continued back to Minkowski space to allow for a comparison with experiments. After all, the physical 
quantities arc defined in the Minkowski theory and not in the Euclidean domain. In some cases (e.g. for 
the correlation lengths of the two-point function which is the inverse of the particle mass), the results 
are identical, i.e. the inverse Wick-rotation docs not change the value obtained in the calculation. 
However, there exist a lot of cases, where the analytic continuation is non-trivial. For details the reader 
is encouraged to consult [24] . 

The relations between the different axiomatic settings discussed so far are given in Fig. 12.51 From 
the Wightman distributions, the whole QFT can be constructed. However, the Osterwalder-Schrader 
axioms are an equivalent formulation. The Schwingcr distributions and the Wightman distributions are 
related by analytic continuation. 

The permutation symmetry of the Schwingcr functions allows for the construction of a generating 
functional. In contrast, the Wightman functions arc only symmetric for spacclike-separated arguments. 
Thus, they can not be computed in terms of a generating functional. Another elegant way to define 
generating functions in Minkowski space is the introduction of Feynman functions which can be defined 
as "time-ordered" products of field operators: 

\$n (x\ , • • • , X n ) = {il\7{(/>[x 1 ]...(l)[x n ]}\n), (2.16) 

where the time-ordering is defined as the product with factors arranged so that the one with the 
last time-argument is placed leftmost, the next-latest next to the leftmost etc. jTH] - There is also an 
alternative definition in |28j . It is given by applying a Fourier transform to the Schwinger function and 
performing an analytic continuation back to Minkowski space afterwards. This is displayed in Fig. 12.61 
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Figure 2.5: Relations of different axiomatic frameworks for quantum field theory. 



By construction, the Feynman functions are also symmetric for timelike-separated arguments and thus 
they are symmetric for arbitrary arguments. Hence, both the Schwinger and the Feynman functions 
allow for the construction of a generating functional We [J] and W$ [J] . Only the Schwinger functions 
will be considered here — formally the generating functional (sometimes it is also called the vacuum- 
vacuum functional) is given by 



We [J] = — / dXl . . - dXn&n (Xl, ...,X n ) J(xi) . . .J{x n ) 



(fi| exp 



d 4 x0[f]J(x) 



l«>, 



(2.17) 



where the functions J(x{) are taken from the Schwartz space £>(IR 4 ). Using Wg[J], the Schwinger 
functions can be recovered by a functional derivative 



®n (x\ ; • • • ) X n ) 



6 n W e [J] 



5Ji . . . SJ n 

Knowledge of We is thus equivalent to solving the quantum field theory. 



(2.18) 
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Figure 2.6: Relations between the different kinds of n-point functions consisting of vacuum expectation 
values of products of field operators. The figure has been taken from |28| . 



2.3.3 The Path Integral 

Having now discussed what a quantum field theory is, one needs a recipe of how to construct it. In fact, 
there exist several prescriptions of how to build a quantum theory if the Lagrangian of the classical 
field theory to which the quantum theory should reduce is known. The two most commonly used 
quantization schemes are the canonical quantization scheme (which is described in detail in standard 
textbooks like [511201 1221 118| ) and the path-integral quantization which will be used in this thesis. Both 
of these schemes break the general covariance of the classical theory discussed in Sec. 12.21 The quantum 
theory still stays invariant under global Lorentz transformations (and there even exist generalizations 
to curved, but fixed spacetimes, see e.g. and references therein), but the quantization prescriptions 
implicitly assume the existence of a global, canonical basis. However, local Lorentz invariance is only 
of importance for a quantum theory of gravity, so all to be said in the following can be applied to any 
quantum theory of strong interactions discussed in Sec. 12.11 



Construction Principle 



Before attempting to define a prescription for a quantum field theory, let us go back to the case of 
non-rclativistic quantum mechanics. The notion of a path integral is closely related to the notion of 
a random walk. To make this relation obvious, consider the expectation value (E) of the evolution 
operator applied to a single particle in one dimension between two states \x), \y) *ETt: 



(E) = (x\ exp 



-iHt 



(2.19) 



(E) is the probability amplitude for the particle to move from position x to position y in time t. If the 
Hamiltonian corresponds to a free particle, 



H = 



2m' 



then the solution to (|2.19() can be given immediately |24| : 



(E) 



(—) 



1/2 



exp 



.m 2 



(2.20) 



On the other hand, the probability for a one-dimensional random walk to go from position x to position 
y in time t is given by |28| : 



P (x - y,t) 



\4TrDt 



1/2 



exp 



ADt 



(2.21) 
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with D being the diffusion constant. The quantum mechanical expectation value is obtained by ana- 
lytic continuation of Eq. I|2.21[l to imaginary time and the identification D = l/2m. Thus, the quantum 
mechanical amplitude may be computed by considering a classical random walk and analytically con- 
tinuing the result to imaginary time. If one adopts this interpretation, the amplitude can be computed 
via a path-integral using a conditional Wiener measure, see [28] for a rigorous mathematical treatment. 
To extend Eq. (]2.20[l also to the case of non-Gaussian Hamiltonians, we decompose the full Hamiltonian 
H into a Gaussian and a non-Gaussian part: 



H = H + V{x) , 



(2.22) 



and perform a time-slicing procedure. Consider the evolution operator U(t) = exp[— iHt] for small 
imaginary times t — > ie. In the leading order, U (ie) coincides with the operator W (e) which is defined 
in the following way: 



U (ie) = W (e) + O (e 3 ) = exp (>|) exp (# e) exp (>|) + O (e 3 ) 



(2.23) 



The operator W (e) is known as the transfer matrix] its matrix elements can be computed to yield: 

1/2 

exp 

-2e v 2 



(x\W(e) 



m 
2ne 



m (x~y) 2 + e -(V(x) + V(y)) 



(2.24) 



Using the Lie- Trotter formula, one gets: 



exp 



H + V f 



lim W N (e) . 



(2.25) 



Inserting N — 1 times the identity 1 = J dxi\Xi)(xi\ into Eq. (]2.25[) finally yields the expression: 



{x\ exp 



-Ht 



/ m \ 



lim . 

N^oo \ LTie) 

x exp 



N/2 



dx\ . . . dxN-i 
(^(x - xif + . . . + (xn-x - y) 2 ^j 



-e[-V(x) + V(x 1 ) + 



V(x N _ 1 ) + -V(y) 



/ m \ 



lim 



N/2 



dxi . . . dxN-i 
x exp [S N (x,xi,..., x N -i,y)} 



(2.26) 



In the continuum limit this equation can now be interpreted as a path integral over a set of random 
walks with the weight function given in the exponential. If wc denote all paths oj(t) with fixed end- 
points from oj(t = 0) = xq = x to u)(r = t) = xn = y, then we can write (using the Wiener measure 
[da;]): 



(E) = [doj]exp(-V[uj}) 



lim 

N^oc 



dxi... dx N -iP (x , x N ) exp (-V (x , . . . , x N )) , 



(2.27) 



where 



iV 



P (.T , . . . , X N ) = Y[ P0 - Xi,t l+ l - ti) . 



i=0 
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Now there is an important difference between the forms of Eqs. ()2.26|) and 1)2.27)1 : In the latter, the 
exponential weight which connects neighbor points of the paths is already a part of the measure [dcu] . 
while in the former the exponential weight is contained in the expression for Sn (xq, ... ,xjf). What 
is then the interpretation of this weight factor? From Eq. i)1.2)) we know, that any amplitude can 
be expanded in a power series of ft" with the amplitude for the classical process being the leading 
amplitude. Thus, in the limit ft — > 0, only the classical (leading) contribution should contribute to the 
expression 1)2.260 . The exponential weight factor should thus be peaked around the classical solution, 
i.e. the exponential factor will become minimal for the classical trajectory, just like minimizing the 
action S [w(t)] yields the classical path. Thus, the exponential weight factor in the limit TV — ► oo 
should coincide with the classical action if one inserts a differentiable trajectory. However, there is 
an important difference: the classical action § [w(r)] is only defined for differentiable paths, while the 
exponential factor limjv^oo Sn(xq, ■ ■ ■ , xjy) in (|2.26|) is defined for any continuous path (which is a 
superset of the set of all differential paths). This gives rise to a certain freedom in the choice of 
Sn(xo, ■ ■ ■ ,xjsr). The actual choice should thus be guided by the desire to simplify the problem at 
hand. Especially in the case of chiral fermions, a wide class of possible actions has been proposed, see 
SecEEl 

Symbolically we can thus introduce a functional S [w(r)] which projects any continuous path to a real 
number and write 

(x|exp -Ht \y)= J Vxexp[-S[x]] , (2.28) 



where the integral measure is given by 

N/2 

iV-T5o \2neJ 



fTYl \ 1 
- — dxi...dx N -!. (2.29) 



The resulting expression Eq. (|2.28|) can be analytically continued back to imaginary times using t — ► it 
which yields the desired transition amplitude Eq. I|2.20|l . Such a Wick-rotated form of Eq. I|2.28|l is 
known as the Feynman-Kac formula. Sometimes the derivation is directly carried out in Minkowski 
space, but the problem is that the integrand is highly oscillatory and not well-defined, for further reading 

cf. iSDj . 

As already pointed out, the difference between 1)2.27)1 and (|2.28|l lies in the interpretation of the measure. 
One can perform substitutions T>x — ► T>x' to (|2.29|) . giving rise to different integral measures. Since 
the path integral has close resemblance to a system of statistical mechanics (via its affinity to the 
random walk) , we will classify the different classes of paths which can be used in (|2.28|l by the means 
of ensembles. This topic is discussed in detail in Sec. 12.3.41 For the time being, we want to interpret 
1)2.26)1 as an integral over random paths with a weight given by the entire exponential. This amounts 
to choosing the measure 

T>x oc lim dx\ . . . dxN-i •, 

AT— ►oo 

Later in Sec. l2.3.^l it will be argued that these paths are taken from the random ensemble. In contrast, in 
the expression 1)2.27)1 using the Wiener measure [du>] , the paths are taken from the canonical ensemble. 
This integral measure already contains the kinetic term, but not the potential term. In this way, 
the problem of assigning a meaning to the derivative from the classical action is circumvented. This 
procedure is not possible in the case for quantum field theories which will be discussed below since in 
that case there is no such thing as a Wiener measure. 

Computing Observables 

As discussed in Sec. 12.3.21 one is interested in ground-state expectation values of certain operators, 
(0|A|0). Consider a (countable) Hubert space Ti with Hamiltonian H. Let {Ei}, i > 0, be the eigenvalues 
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and {\i}} be the corresponding eigenvectors of H in ascending order. Taking the trace of the evolution 
operator and A provides us with 



Tr (A exp 



-Ht 



5>xp [-E iT ] (i\A\i) 

i=Q 



In the limit r — > oo only the term with Eq in the exponential survives and we are left with 



(0\A\0) = lim 



1 



Z(t) 

with the partition function 
-Ht 



Tr Acxp 



-Ht 



Z(t) = Tr exp 



= ^cx P [-£; 4 t] 



(2.30) 



(2.31) 



For the application to field theory, the operators x(t n ) will require special attention, since they are 
analogous to the Schwingcr functions encountered in Euclidean quantum field theories in Sec. I2.3."2")l . 
Using x(t) = cxjp[Ht]xexp[— Ht], we consider the n-point correlation function (x(ti) . . . x(r n )}, with 
T\ < . . . < t„ . It is straightforward [23 to show that 



c(n) . . .x(t„)|0) = lim 



Z t) 



Tr e- H( ^ 2 - T ^xe- H ^-^ 



-H(t 



r/2) 



= lim -— / Vxx{T 1 )...x(T n )exp[-S[x{T)]] , (2.32) 

r^oo Z(T) J 

where the paths obey periodic boundary conditions, 

x{-t/2) = x(t/2), 
and the partition function Z(r) can be written as 

Z(t) = J X>xexp[-5[x(r)]] . (2.33) 

Hence, the n-point correlation function (x(ti) . . . x(r n )) is written in l|2.32|) as the moment of the measure 
T>x. There is another possibility to obtain the correlation function from a generating functional Zj[J(t)] 
with J(t) being a continuous path by means of the following definition: 

N/2 



z j[J( t )} = um (o — dxi . . .dx N ^i 
N—*oo \ zire / 



x exp 



-^(xo, 



,x N ) 



(2.34) 



Using 



lim 



x exp 



at^cxj \27re/ 



N/2 



dxi . . . dxN—i- 



dJ(n) ' ' ' d,J(T n ) 

n 

-S(x , . . . ,x N ) + y~]xiJ(n) 



x exp 



N/2 



-S(x , 



dxi . . . c?xjv-i x(n) . . . x(r„) 

n 

,X N ) +^XiJ(Ti) , 



(2.35) 
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one recovers Eq. ((2.32(1 . The meaning of the derivatives can be understood by considering again the 
subset of differentiable paths. The expression then reduces to the functional derivative. Thus, Zj[J(t)] 
can be considered to be the generating functional for the n-point correlation functions and we can write 
symbolically: 

(.(n)...^))^ ^^ . (2.36) 

Here the notation of the conventional functional derivative has been employed, but with a meaning 
corresponding to Eq. 1(2.351 This expression is similar to the generating functional of the Schwinger 
functions (|2.18|l . implying that generalizing Zj[J(t)] to the case of Euclidean fields is the key to find a 
quantization prescription for quantum field theories. 

Euclidean Field Theory 

The generalization of Eq. 1)2.36(1 to the case of Euclidean fields is very difficult, however. As a starting 
point one can expect that the expectation values for the Schwinger function in ((2.15(1 can also be written 
using a path integral just like the n-point functions in 1(2.32(1 . They would then be moments of some 
suitably defined measure 

[d0] = Ncxp [-S [(/)}} V<f> . (2.37) 

The Schwinger functions & n (x\, . . . , x n ) can hence be written 

& n (xx,...,x n ) = w \q, J [d<f>](f>(xx) . . . <f>(x n ) , (2.38) 

where the generating functional VFefO] is the field theory analogue of Eq. 1(2.33(1 . It can symbolically 
be written as 

W e [0] = J [defi = Z. (2.39) 

The functional S[4>] appearing in Eq. ((2.37(1 is again a suitable generalization of the Euclidean action 
to a superset of continuous, but non-differentiablc fields. The vacuum expectation value (A) = (f2|A|Q) 
of a general operator, A [(j>(x)], is then defined by the path integral 

(A) = Z- 1 JmAW , (2.40) 

where the partition function Z is given by ((2.39(1 . However, this definition encounters severe difficulties 
because of the fact that the </>[/] are not pointwise-defincd objects. 

By inverting the logic which led to the path- integral formula Eq. 1(2.27(1 . one can define a prescription 
to formulate a quantum field theory starting from a classical action §. This procedure which gives 
meaning to Eq. 1(2.37(1 is called renormalization theory and consists of the following steps |21( : 

1. Regularize the theory by imposing an ultraviolet cutoff A = a -1 (where a is a distance short 
compared to the intrinsic scales of the theory) so that 1(2.3711 is a well-defined measure. This can 
e.g. be done by discretizing the Euclidean space R 4 to describe the system using a (finite) lattice 
in Zq such that all {x{\ £ Zq. Find a functional S{ gi } [<fi(x)} with parameters on the lattice 
which reduces to the classical action S[0(x)] for differentiable continuum fields. This prescription 
is not unique. In any case, however, either Euclidean invariance or Osterwalder-Schrader positivity 
or both are broken. Let (</>(xi) . . . 4>(x n ))^ gi y be the n-point functions of the discrete theory. 

2. Perform the infinite volume limit — > oo for the system with {gi} held fixed. This limit must 
exist and be unique. 
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3. Allow the parameters {gi} of S{ 94 } [</>] to be functions of A: {gi} i— ► {gi (A)}. The parameters 
occurring in i|2.37[l are then called the bare parameters. 

4. Perform the continuum limit A = a -1 — > oo. A continuum quantum field theory is obtained from 
the sequence of lattice theories by rescaling the lengths by a factor A and rescaling the fields by 
a factor Z(A): 

& n (xi,...,x n ) = lim Z(A) n (</>(xi) . . . 4>(x„)) { 9t (A)} . (2.41) 

For each choice {gi (A)} check the convergence properties of & n and if they satisfy the Osterwaldcr- 
Schrader axioms. 

5. Consider all possible choices of {<?i( a_1 )} and Z(a^ 1 ); classify all limiting theories & n (xi, . . . , x n ) 
and study their properties. 

This procedure may give rise to continuum theories which can be categorized as follows: 

No limit: For at least one n, the limit l|2.41(l does not exist. 

Unimportant limit: All resulting {&„} exist, but are devoid of information (like &„ = Vn etc.) 

Gaussian limit: The limiting theory {& n } is Gaussian, i.e. a generalized free field. This situation is 
commonly referred to as triviality. 

Non-Gaussian limit: The limiting theory is non-Gaussian giving rise to a nontrivial theory. This may, 
however, still imply that the scattering matrix is the identity. 

For a non-trivial limit to exist, the lattice theories should have correlation lengths £(A) ~ A _1 £{gj(A)} 
as A — > oo (otherwise the physical lengths would get rescaled to 0). Thus, the parameters {gi(a^ 1 )} 
should approach or sit on the critical surface and the theory must undergo a phase transition of second 
order where the correlation lengths diverge. This is expected to be the case for most interesting quantum 
field theories whose critical behavior can be handled using the renormalization group of Wilson, see 
|3"T] for the historical paper and 123 for standard textbooks. There is also a second very interesting 
case where £{<?i(A)} = oo for all A, i.e. the parameters {gi} already sit on the critical surface for finite 
lattice spacings. This is e.g. the case in non-compact U(l) pure gauge theories. For compact U(l) the 
situation is less clear so far, consult for a description of simulation results the work of Arnold jSH] and 
references therein.. 

Despite the huge phcnomenological successes of quantum field theories in practice, a rigorous proof that 
the resulting theory exists in the sense defined above, has been stated so far only for a few special cases. 
In four dimensions, so far only free fields have been proven with mathematical rigor to give rise to a 
rclativistic quantum field theory. 

Evaluation of Path Integrals 

Having now a definition for the path integral, we also need a way to evaluate it. In principle there are 
two different ways to compute expressions of the form (|2.28() and 12.40fl : 

• Consider a scries of weight factors {exp [— Si]}, i = 1,...,N, which converges to the desired 
weight factor lim^oo exp [—Si] = exp [-S]. The path integrals lj2.28J) should be computable for 
each exp [—£»]. 

• Compute an approximation to (|2.28|) for finite N in the measure (|2.29|l . The resulting approxi- 
mation will depend on N. Then perform the limit N — > oo. 
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As has already been mentioned, taking the limit in lj2.41|l is only possible in some simple models, or 
in the case that the resulting integrals have Gaussian shape. One way to also extend the applicability 
to non-Gaussian models is thus to approximate the "true" function We by intcgrable Gaussian models 
which reduce to the W& in some suitable limit. 

The most popular form to do this is to expand the exponential into a Gaussian part and a small, 
non-Gaussian part: 



exp [S[4>]] = exp [-S G _] £ Y l ^ [ - {S J G —^> . (2.42) 



71! 
n=0 



The idea is then to form the path-integral of the r.h.s. of Eq. (|2.42|) and take the result to be the sum 
of all contributions. The problem behind the series obtained this way is that in several cases the sum 
fails to converge. This is the case of the common four-dimensional models, as has first been noted by 
Dyson in [34] . 

As an example consider the "field theory" at a single site with "partition function" [jSj 



Z = -= / d(f> exp (-<f 2 /2 + 3 4 ) . (2.43) 

The function Z = Z(g) contains an essential singularity at the origin. Performing the perturbative 
expansion QTZfy yields 

Z = J29 k Z k , Z k = -^~\ - d<f> exp(-0 2 /2 + 4fcln^) , 



k=0 



which has a convergence radius of g = 0. Performing a semi-classical expansion around the saddle point 
<fi c = 2v / fc and integrating over the quadratic deviations yields 

(-I6) k 

Z k w v > exp((fc-l/2)lnfc-fc) . 

Obviously the Zk are divergent (and the divergence is in fact logarithmic), but the power series is at 
least asymptotic in the complex g plane cut along the negative real axis since 



z( g )-J2a k z k 



k=0 



< 4"+ 1 r(2?i + 3/2) \g\ 



v^F(n + 1)! (cos(l/2Arg 5 )) 2 ™+ 3 /2 



This means that for fixed n the right hand side can be made arbitrarily small by choosing g small 
enough. It may even be possible to recover the full partition function Z from the series expansion {Z^} 
using resummation. For recent reviews of the application of resummation techniques consult |36l I37| . 
Despite these conceptional difficulties, perturbation theory turns out to be the most effective approach 
to treat many problems in quantum field theory provided the expansion parameter is sufficiently small. 
However, in several situations of interest, the latter condition is not fulfilled and the perturbative 
expansion is not even asymptotic, or the expansion parameter is too large, causing it to diverge already 
in the lowest orders. In these situations, one has to resort to different ways to approximate the Schwingcr 
functions. One possibility is a numerical simulation of Euclidean QFT on the finite, discrete lattice 
Zq. It has some very intriguing advantages: it does not resort to any assumptions of the dynamics 
of the model one is examining other than the information underlying the regularized action and it 
is directly based on the definition of the quantities under consideration. In essence, any operator 
A [4>(x)] corresponding to a physical observable can be written via H2.4()(l as the corresponding moment 
of a measure [d</>] on the underlying space. The ensemble of field configurations <j)(x) is distributed 
according to the partition function l|2.39|l . Consequently, the latter is the quantity which one tries to 
access in numerical simulations. 
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However, this approach has the shortcoming that the actual continuum limit can never be performed 
and at best one has to resort to extrapolation techniques giving rise to further uncertainties. Since the 
actual shape of the Schwingcr functions is not recovered, an analytic continuation to Minkowski-space 
is not possible either and objects like distribution amplitudes are not directly accessible. Nonetheless 
it is possible to compute integrals over these functions and their moments, which help to shed light on 
their behavior. This approach has been used in e.g. |38l 1391 BTfl 1411 "4"2l I43| to extract information about 
form factors and structure functions from lattice simulations. 

One important question is if the theory is renormalizable if one uses a perturbative expansion. There 
are models which arc renormalizable non-perturbatively, but are non-renormalizable when employing a 
perturbative expansion. This is the case for the Gross-Neveau model at large N in three dimensions 

ED 

However, due to the great importance of perturbative methods, the models which are perturbatively 
renormalizable are considered in most practical applications. This means that one has to choose La- 
grangians with mass dimension g?(X) < 4 |35j . where the mass dimension for the scalars 4>, Dirac spinors 
ip and vector fields and their derivatives are given by: 

d{4>) = 1 , d(d n (f>) = 1 + n , 
d(V>)=3/2, d(d n ip) = 3/2 + n, 
d{A^) = 1 , d{d n A tl ) = 1 + n . (2.44) 

The mass dimension of a composite term in the Lagrangian is given by adding the mass dimensions of 
its factors. A dimension-four term then corresponds to a renormalizable interaction, less than four is 
super-renormalizable and greater than four is non-renormalizable. 

2.3.4 Ensembles 

Following its definition, Eq. 1|2.40[) . the quantum mechanical vacuum expectation value (A) of some 
functional A [</>] of the fundamental fields in the theory, (j){x) , can be written as the moment of the 
measure ()2.37|) . As discussed in Sec. 12.3.31 the analytic treatment of equation l|2.40J) is only possible 
in case the path integral has the shape of a Gaussian or in some toy models. If one does not want to 
recourse to expansion techniques or simplifying assumption at this stage, the only alternative method 
known today is the numerical treatment of i|2.40|l . However, a straightforward integration does not 
appear to be feasible, since the dimensionality of the integral in simulations as they are run today is 
easily exceeding 10 6 |44j . The only alternative is therefore a Monte-Carlo integration. To define possible 
techniques for treating this problem, the concept of ensembles of configurations has turned out to be 
extremely useful [2*4] : 

Ensembles: An ensemble ({</>}, p{<l>}i [d</>]) consists of an infinite number of field configurations {</>} with 
a density p{4>} defined on the measure [d(/>]. 

A simple example is the micro- canonical ensemble, which is defined by 

P {4>}^ cx S (S [cp] - C) , (2.45) 

with a constant C G Mr. Thus, this ensemble only consists of configurations with a fixed action. 
Obviously, this ensemble cannot be used for the evaluation of (|2.40|) . since the majority of configurations 
appearing in the path-integral are not members of {</>} M _ ca „. To take account of the need to include any 
possible configuration in the ensemble, we also have to introduce the notion of ergodicity: 

Ergodicity: An ensemble ({</>}, p{0}, [d(/>]) is called ergodic if 
p{<t>} > V(/> G E 4 . 
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An example of an ergodic ensemble is given by the random ensemble, where each possible field config- 
uration enters with equal probability: 

p{<j>U ni « 1 . (2.46) 
With the measure [d</>] rand from the random ensemble, the expression (|2.4U|) becomes 

(A) = Z- 1 f [d^] rand e- s ^A [0] , Z = [ [d^] rand e~ s M . (2.47) 



Switching to different ensembles in path integrals consists of a rc-paramcterization of the measure. It 
is therefore equivalent to the substitution rule in ordinary integrals. 

Another example of an ergodic ensemble is given by the canonical ensemble (also known as the "equi- 
librium ensemble" ) which is defined by 

p{<j>} can cx e~ s ^ . (2.48) 

The measure in H2.37|) is corresponding to the canonical ensemble and therefore underlying the path 
integral definition in Eq. I|2.40|l . Due to this simple form of the operator expectation value, the canonical 
ensemble (I2.48|) plays a huge role in numerical simulations of quantum field theories. 
Finally an important generalization of the canonical ensemble is given by the multi- canonical ensemble. 
Suppose the underlying action in Eq. Ij2.48j) is replaced by an action S [0] — » S [<j)\ = 3 [<j>} + [<j>] , 
with some parameter 7. The ensemble {</>}2miti-can with density 

PWl^^e- 3 '^ (2.49) 
leads to the following shape of (|2.40|) : 

(A) = Z- 1 J [dC„ lti _ can e^ ] A [0] , Z 7 = J [dfll^ e^M . (2.50) 

The reason why (|2.49() is useful is that it is often possible to find an action S' [cj>] which is numerically 
simpler to handle and simulate than the original action S [<fi] and with the ensembles (|2.48|) and H2.49J) 
being close enough to each other such that the "reweighting correction" in (|2.5()(l is small. A situation 
where this is the case is given in this thesis in the framework of the TSMB algorithm to be discussed 
in Sec. ETO 

The ensemble is given by an infinite set of field configurations {4>}. The introduction of ensembles 
thus apparently made the problem of integrating a complicated multi-dimensional system even worse 
instead of simplifying it. However, the re-formulation of the problem allows for a solution by a different 
integration technique, the Monte-Carlo integration |241 1321 135] . This numerical method is going to be 
discussed in Sec. 13.11 

2.4 Gauge Theories 

The guiding principle of the construction of quantum field theories in Sec. 12. 3. ^ was the idea of locality. 
For a start, consider the TV-component (N > 2) Yang-Mills theory described by the Lagrangian: 

£ ($ N , * N ) = fjv (ip - m) V N , (2.51) 

which is invariant under global transformations U £ SXJ(N): 



TJ . J y N ^y' N = Uy Nl 
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However, a global transformation on the fields living in K 4 is not consistent with the idea of locality. 
Rather we want a theory which is invariant under local gauge transformations U(x) € SU(iV): 

A theory invariant under these transformations is called a gauge theory. It is possible to add to Eq. (|2.51() 
a term containing a new set of fields A£(x) such that it stays invariant under the transformation 1)2. 53|) . 
The simplest way to do this is to choose 

C(* Nt * N ,AZ)=* N (ip-m)9 N , (2.54) 
where the covariant derivative Tf) is given by 

f> = ^(d» + igA£), (2.55) 
and the transformation of A%(x) must be given by 

U{x) : A*{x) - A?(x) = U\x) (0" + A£(x)) U(x) , (2.56) 

meaning that the A£ lie in the adjoint representation of SU(AT) and that 1 < a < dim (SU(iV)). Thus, 
the resulting theory will now contain the fields ^jy, ^>n, and A£(x). The new fields A£(x) are termed 
gauge fields and their coupling to the fields "J at, is given by the dimcnsionlcss coupling strength g. 
By postulating the fields to be invariant under the transformations (|2.53() and i|2.56[l and requiring that 
the Lagrangian only contains perturbatively renormalizablc terms (see Sec. 12.3.1^1 . one is finally led to 
the general form 

JV 2 -1 

L{^> N ^ N ,A») = -- V"° + ^(i|>-m)^, (2.57) 

a=l 

with the field strength 

N 2 -l 

F% = d^Al - d v A% + g fabcA^Al . (2.58) 

6,c=l 

There is an important difference between the pure gauge part in the SU(iV) Lagrangian (|2.57|l and 
the single gauge field Lagrangian (|2.9|) corresponding to an Abelian gauge group: The former contains 
interactions between different components of the gauge field A£, while the latter describes a true free 
field. Thus, the iV-component vector theory contains interactions even in the case of a purely gauge 
theory without coupling to a matter field. It is argued below, that this phenomenon leads to the 
dynamical generation of a mass scale in the case of the quantized theory. This phenomenon is also 
known as dimensional transmutation. 

Since the group SU(AT) is non-Abelian — their elements don't commute — Eq. 12.571) is referred to as 
a non-Abelian gauge theory. For alternative ways to define a gauge theory cf. |461 124| and references 
therein. 

In addition to the SU(iV) symmetry, the Lagrangian H2.57J) is also invariant under axial rotations of the 
fcrmion fields, provided, the Dirac part is massless (m = 0): 

f h^$' = exp [75a] * , 

* ^ §' = § exp [-75a] . (2.59) 

The question arises, whether this symmetry exists also on the quantum level, or if it is broken by 
an anomaly. As has been realized by Adler 07] and Bell and Jackiw > f° r an Abelian gauge 
theory this is indeed the case. The anomaly responsible for breaking the axial current corresponding to 
the symmetry (|2.59J) is known as the Abelian anomaly or ABJ-anomaly. It is present once the theory 
contains fermions and is independent of the fermion masses. This result has also been derived non- 
perturbatively by Fujikawa 0^1 • An extension to non-Abelian theories has been given in j^Oj. For a 
textbook containing a rigorous mathematical treatment consult |SJ . 
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2.5 Quantum Chromodynamics 

Now the ground has been prepared to formulate quantum chromodynamics (QCD) as the theory un- 
derlying the strong interaction. It is a Yang-Mills gauge theory (see Sec. I2.4JI symmetric under the 
SU(3) group (as discussed in Sec. 12.1(1 . where the latter symmetry group refers to the color degree of 
freedom of the quarks. It contains six flavors of quarks with masses {rrik}-, with each flavor of quarks 
transforming as the fundamental triplet representation of the color group. The accompanying vector 
bosons, the "gluons" transform according to the adjoint representation. Furthermore we require the 
theory to be perturbatively renormalizablc. Thence, the resulting Lagrangian £ QCD in Minkowski-spacc 
is given by (the number of colors is denoted by N c = 3) 

,8 6 N c 

L ^ = -4 E G % G ' lva + E E ^ p - mfc ) **<* • ( 2 - 6 °) 

a— 1 k—1 a— 1 

It is also possible (without violating pcrturbative renormalizability) to add a term of the form 

a—l 

to H2.60J) . This term is known as the "#-term" and would be a source of CP violation The 
experimental limit for 6 is 0(1O~ 9 ). Thus, this term will not be considered in this thesis. 
Several important properties of QCD can be learned by considering the symmetries of l|2.60[) |35j . For Nf 
masslcss quark flavors, £q C d is invariant under several global symmetry transformations. In particular, 
one can decompose the Dirac spinors into left- and right-handed quark fields and perform independent 
rotations on the resulting Weyl spinors. This yields a global SU(iV/)i <E> SU (Nf)n symmetry (this 
symmetry is also known as chiral). Furthermore one can make independent global vector and axial 
rotations on the full Dirac spinors resulting in a global U(l)y <£> U(l)^ symmetry. When looking at the 
masses of the different quark flavors, one can indeed consider the masses of the u- and d- quark flavors 
to be almost zero compared to the typical scales of hadronic resonances. To a lesser extent this is also 
valid for the s-quark flavor. Thus, QCD contains three almost massless fermion flavors and should 
consequently have a global SU(3)l ® SU(3),r £g> U(l)y £g> U(1)a symmetry. 

According to the Noether theorem, there should be conserved charges corresponding to each symmetry of 
the Lagrangian. The U(l)y-symmetry is indeed associated with a conserved quantum number, namely 
the baryon number which is conserved exactly by the strong interaction. The current corresponding to 
the axial U(l)^-symmetry is, however, explicitly broken by the ABJ anomaly (cf. Sec. I2.4f) if the theory 
is quantized. Nonetheless, one can find a modified, conserved current albeit it will be gauge-dependent 
and thus not represent a physical current. 

From the remaining chiral symmetry, one half is indeed present in the hadron spectrum, namely as the 
flavor SU(3)f symmetry discussed in Sec. 12.11 This half corresponds to a vector symmetry transforma- 
tion of the Dirac spinors. The other half, however, which corresponds to an axial vector transformation 
would result in a parity degeneracy of the particles which is clearly not observed. To be specific, there 
are no parity degeneracies present in the hadron spectrum at all. Thus, the quantization of QCD must 
break this symmetry. Since there is no anomaly which could attribute for this symmetry breaking, 
it must be broken in a spontaneous manner, i.e. the ground state of the theory will not be invariant. 
Due to the Goldstone theorem j^Hl, consequently there exist massless particles corresponding to the 
pseudoscalar mesons whose masses are much smaller than those of the other hadrons. The fact that 
they are not zero can be attributed to the explicit breaking of chiral symmetry due to the small masses 
of the light quarks. Within the framework of Chiral Perturbation Theory (%PT) (see Sec.EDU, ^ can 
indeed be shown that for small quark masses, the effect can be treated perturbatively. 
But there does not seem to exist any Goldstone boson corresponding to the breaking of the axial U(1)a 
charge. The only particle with the correct symmetries is the Ty'-meson whose mass is far too large 
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(see Tab. 12.1(1 . The solution of this problem is related to the topology of the gauge field. Topological 
transitions can produce the ry'-mass via the axial anomaly. A possible explanation is that instanton 
transitions (see below) are responsible for these topological charge fluctuations. 

2.5.1 Running Coupling and Energy Scales 

The quantum theory build upon 1(2.60(1 is characterized by a running coupling (for details see e.g. (35(1. 
Performing a leading order perturbativc analysis and renormalizing the theory, the behavior of the 
running coupling "constant" is found to be 



with fa = 11 — 2/3Nf, where Nf is the number of active flavors [2]. This defines the coupling at an 
energy scale Q 2 . There are two important lessons to be learned from 1(2.61(1 : 

• The coupling as(Q 2 ) decreases for increasing values of Q 2 . The interaction vanishes for Q 2 — > oo 
and the particles becomes free in this limit. This property is referred to as asymptotic freedom. 

• The coupling becomes infinite for a certain finite value of Q 2 , Aq Cd . This happens also in case of 
an Abelian gauge theory (where the underlying group is U(l)) and shows an intrinsic inconsistency 
under which 1(2.61(1 has been derived: The assumption that a(Q 2 ) is small becomes invalid for 
increasing as{Q 2 ) at some point and the series starts to diverge already at the first order beyond 
tree level. This singularity is called the Landau pole and is considered to be an unphysical 
remnant only present due to the fact that perturbation theory cannot be applied for too large 
expansion parameters. The appearance of the Landau pole thus sets a limit to the applicability 
of perturbative calculations. On the other hand one can expect the calculation to be valid at 
energies far larger than A QCD . 

It is usually assumed that, when "solving" full QCD by the methods sketched in Sec. 12.3.31 one also 
obtains the whole low-energy phenomenology with minimal input. There is no reason why the failure 
of a single method, namely the perturbative expansion around the free field, should imply that QCD 
is not valid at low energy scales. However, a concise solution of interacting quantum field theories is 
not in sight, so one has to stick with a number of models parameterizing the low-energy behavior. One 
of these parameterizations is xPT [52). Besides the latter, there are also different effective theories 
which parameterize the behavior of the strong interaction at low energies: models like the Nambu-Jona- 
Lasignio model the skyrmion model [SI], or models based on instantons (see below) are different 
attempts to describe the properties of low energy strong interactions. The hadrons built up from one 
of the three heavy quarks can be described using Heavy- Quark Effective Theory (HQET) , see (551 156] 
for introductions. 

However, all these theories are only able to predict the low-energy properties of the strong interaction; 
they do not incorporate an adequate mechanism for the description of the parton content of hadrons. 
For the high energy regime, the perturbative treatment of QCD has to be used, which describes the 
interaction using the color group with the gluons being the mediators of the strong force. However, if 
the strong interaction is described using the flavor group as an interaction between the baryons (the 
octet multiplet in the flavor SU(3) group), then the mediating particles are the pseudoscalar mesons. 
One particularly important concept in the development of QCD is the hypothesis of confinement. The 
common understanding of confinement is that in a world without sea quarks, the static potential of 
two quarks would be linear growing without limit. This leads to bound quarks not being separable and 
thus free quarks being unobscrvablc. One consequence of this picture of confinement could be that the 
classical limit, Eq. 1(1. 2|) . may not exist. Thus, the consequences of confinement could be wide-reaching. 
The best tools which have so far been used to address this particular issue are lattice simulations. For 
a recent discussion of lattice simulations regarding confinement, see |E7] and references therein. 



a s (Q 2 ) 



1 



(2.61) 
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There is another very important property of QCD shared with other non-Abelian gauge theories: Con- 
sider l|2.57[) without fermions. Then it can be shown [SSj that there exist gauge field configurations 
which vanish at spatial infinity, but fall into different topological classes. They may be characterized by 
the winding number n(U), which is given by the Chern-Simons three form on the gauge fields |59l 15 1| . 
The transition between the different topological sectors may be performed using the instanton solutions. 
These are solutions of the classical equations of motion and they may also contribute significantly in the 
quantized theory. For recent overviews consult |6(JI 1611 l6"2*] . The importance of instantons for hadron 
physics has also been demonstrated on the lattice in [631 I64| . Recently, a method to examine a predic- 
tion of the instanton model with lattice simulations has been proposed in |65| . This method has been 
applied in |66| . confirming the predictions of the instanton model. Indications for this picture have also 
been found in an earlier publication |67j and in later works f$\WZ^ . 

One particularly important point is that the quantum field theory built upon (|2.57(l puts a lower limit 
to the magnitude of the instanton actions resulting in a certain mass scale of the theory. Thus, a 
mass-scale is generated although the classical theory is scale-free (and has no free parameters except 
for the coupling g which can be rescaled to any value). As has already been mentioned in Sec. 12.41 
this phenomenon is known as dimensional transmutation. A different widely discussed manifestation of 
dimensional transmutation is the existence of glue-balls (see e.g. |7()j for a recent overview). 

2.5.2 Factorizable Processes 

Several observables in QCD (like structure functions and form factors etc., see e.g. O El an d for 
a more recent review jZH] and references therein) depend on input from both regimes. For several 
interesting processes involving these observables, a method known as factorization is applicable. The 
formal framework of factorization is the operator-product expansion, whenever it applies. Consider 
two local operators A(x), B(y). The Wilson expansion of the time ordered product of the composite 
operator for short distances (x — y) — > can then be performed as [71] 

7{A{x)B( y ))=Y J C i {x- y )N i {x). (2.62) 

i 

This relation is only established perturbativcly, however. The singularities of the composite operator 
T |a(x)B(j/)| arc then contained in the {Ci} which are C-numbcrs. They are called Wilson coefficients 
and contain the high-energy physics. Consequently, they can be computed perturbativcly. The operators 
{Ni} are local operators containing information about the low-energy regime and hence are usually not 
accessible by perturbative methods. The individual terms in the sum (|2.62(l can be arranged in such an 
order that the single terms behave as a power series in Q~ 2N , where N characterizes the order of the 
associated term. This is done by ascribing a certain "twist" to each term. The first term (which vanishes 
slowest) is called the "leading twist contribution" and the higher terms are consequently "higher twist 
contributions". The series then takes a form reminiscent of the perturbative expansion, Eq. I|2.42[) . 
In the form of (|2~B^I . the high energy regime and the low energy part can be treated separately, and 
the object under consideration factorizes in the two separate contributions. The major ingredient to a 
factorization scheme is the factorization scale, i.e. the scale describing which contributions belong to the 
low-energy regime and thus, to the operators Ni(x), and which contributions belong to the high energy 
part, i.e. the functions Ci(x — y). This leaves a certain freedom in the application of the factorization 
approach. This freedom should be exploited to keep higher-order corrections in the perturbative series 
as small as possible, shifting the majority of contributions into the leading order. 

Naturally the question arises to what extent it is possible to ascribe any meaning to a series like 1(2. 62j) 
if it involves a running coupling (12.61(1 which is singular at some point in the physical parameter space. 
This question has been addressed in e.g. |75| . From a pragmatic point of view one can adopt the series 
despite the conceptional problems. However, one has to circumvent the Landau singularity; to achieve 
this, a number of proposals have been made: one is to apply a "freezing" prescription, i.e. simply hold 
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the coupling constant fixed below a certain point [7_B] ■ Another consists of introducing an effective gluon 
mass [77]. A different approach relies on the application of an analytization procedure (first applied to 
QED by Lehmann and then BOGOLIUBOV, see [731731101 and references therein), which was originally 
invented to extent l|2.61[) also to the regime where Q 2 is a timclike momentum transfer [M] |H2 . Later a 
framework of analytic perturbation theory has been founded on this bases by Shirkov and Solovtsov 
in |83l 1841 1851 I86j . In essence, the Landau singularity in Eq. H2.61J) can be compensated in a minimal 
way by adding a unique power-term replacing the running coupling by 



In contrast to the conventional expansion, the contribution of higher terms appears to be suppressed 
(cf. |84j ) . This observation together with a renormalization and factorization scheme optimized for 
putting most higher order contributions into the leading order should allow for a consistent and efficient 
description of factorizable processes. Indeed, it has been found that this program works for the cases 
of the electromagnetic form factor of the pion and the ir — » 7*7 transition form factor |871 1881 155] and 
yields an excellent agreement with the experimental data while providing a consistent framework for 
the computation of hadronic observables. 



The approach to perform a numerical simulation on a finite lattice yielding an approximation to the 
Schwinger functions & n in discrete Euclidean space is referred to as lattice gauge theory and pro- 
vides in principle the only means known so far to access the complete structure of both the low and 
high energy regime of QCD. Anyhow, due to the technical difficulties inherent to this method, the 
quality of results is poor when compared to perturbation theory (whenever the latter is applicable). 
Thus, contemporary lattice investigations always concentrate on the non-perturbative regime of QCD 
calculating the properties of the low-energy parametcrizations. 

As will be shown in Sec. 12.6.41 there are problems concerning the formulation of massless fermions on 
the lattice. On the other hand, x^T, as a low energy model of QCD, performs an expansion in the quark 
mass around the point m q = and thus allows for a systematic treatment of near-massless fermions; 
for this reason, it is of particular interest for lattice investigations, since one is usually interested in 
performing extrapolations in the quark mass (see e.g. @2] for a recent proposal of how to do this). xPT 
is, however, limited to the continuum theory. Thus, the continuum extrapolation should precede the 
application of xPT. 

Since in lattice simulations one often chooses quark masses occurring in virtual quark loops (the so-called 
"sea-quarks") different from the quark masses appearing in hadrons (the so-called "valence-quarks"), an 
extension of the original xPT-formulation is necessary to handle also these models. The first extension 
was to set the sea-quark mass equal to zero (the quenched approximation) yielding "quenched chiral 
perturbation theory" (for a short discussion and the references, see [50]). This model allows for the 
extraction of phenomenology from lattice simulations if one completely disregards dynamical fermion 
contributions. 

With the advent of dynamical fermion simulations, a further extension of this model introducing different 
masses for sea and valence quarks was proposed by Bernard and Golterman in [HOj resulting in the 
"partially quenched chiral perturbation theory" . In principle, partially quenched chiral perturbation 
theory should allow for the first time to gain direct access to phcnomenological quantities from lattice 
simulations provided a number of conditions is met |10j . In essence, one has to perform simulations 
with three dynamical quark flavors (which may be even mass-degenerate) at rather small masses of 
about 1/Am s . This goal is out of reach with the resources available to the lattice community today, but 
it may pave the way for future lattice simulations aiming at precise measurements of hadron properties. 
While quenched simulations already allow for a rather precise determination of many phenomena in 
QCD |7J|S], there are observables which depend also on dynamical fermion contributions. For example, 




(2.63) 



2.5.3 Lattice QCD 
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the mass of the rj' meson (see above) is only properly accessible in unquenched simulations (see |H] for 
a discussion). 

The different methods for computations in QCD are visualized in Fig. 12.71 
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Figure 2.7: Different methods for obtaining prediction in QCD. 



2.6 Discretization 



As discussed in 12. 3. 31 for the construction of a quantum field theory on a lattice, the functional S[y(a;)] 
is required. This functional should reduce to the Euclidean action S[(y3(a;)] in the continuum limit and 
for diffcrcntiablc paths. Before applying the limit prescription, it will thus differ by C(a)-effects from 
the continuum expression — meaning that in general the choice of S[ip(x)] is not unique but still leaves 
freedom to choose all terms of order 0(a n ) with n > 1 . This freedom should be used to find the form 
best suited for numerical calculations. 



2.6.1 Scalar Fields 

Consider the complex field (fi(x) defined on the sites x £ Z^. The continuum Lagrangian corresponding 
to this situation is given by Eq. H2.6J) . One candidate for the lattice version of the action is then given 

by m 

S[<j>(x)} =]T lj2cj ) (xycj ) (x + fi) + m 2 ^x)U(x)+V[cj ) (x)} j . (2.64) 

x \fj,=Q / 

There are certainly other ways to replace the derivative, but the present choice is the simplest way to 
incorporate neighbor fields. Consequently, this choice is suitable for numerical investigations and will 
be used in this thesis. 
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A particularly interesting model is the so-called c/) 4 -model, where one sets 

v[4>] = ±4>*. 

This model appears to be an interacting, nontrivial field theory at first sight, but already early it has 
been conjectured [31] that it might only give rise to a non-interacting theory of free particles. In later 
investigations this surmise has been corroborated [911 192] . However, a rigorous proof is still missing. 

2.6.2 Gauge Fields 

In the continuum form (]2.57[) , the gauge field is given in terms of parallel transporters along infinitesimal 
distances. By putting the system on a lattice, the shortest (non-zero) distance is the lattice spacing a. 
The parallel transporter connecting a point x G Zq with its neighbor x + fi, is denoted by U(x,x + fii). 
It is an element of SU(iV). The simplest gauge- invariant object one can construct is a closed loop with 
a side length of one lattice unit usually called the plaquette. Starting from the point x € Z 4 2 , one can 
construct the plaquette lying in the /x^-plane by considering 

U[iv(x) = U{x, x + fi)U (x + fi, x + fi + i>)U(x + fi + v, x + v)U{x + u, x) . (2.65) 

Due to the fact that U(x, x + fi) = W(x + fi, x) one can rewrite (|2.65(l : 

Up V (x) = U(x, x + fi)U{x + fi, x + fi + £>)U\x + £>,x + fi + 0)ll\x, x + u). (2.66) 

The suggestion of Wilson [HII] was to use real part of the trace of summed over all plaquettes as 
the action of the system, 



S s [U(x)} = -^^-EE^^W+TrC/^-l) 



( i - ^ ReTvU ^ x ) ) . ( 2 - 67 ) 



X flU 



which is the discretized form of the non-Abelian gauge field part in Eq. Ij2.57|l . This form, however, is 
also applicable to the case of Abelian gauge fields with the Lagrangian given by Eq. (|2.9|) . 
For later applications, the more convenient notation U^(x) = U(x,x + fi) will be used from now on; 
hence, the plaquette, Eq. H2.66J) . is written as 

U^(x) = U^(x)U u (x + pi)Ul(x + G)Ui{x) . (2.68) 

The path integral form of the partition function, Eq. I|2.39f) . on a finite lattice Zq is given by 

Z = [ Y[ dU(x) exp [S[U(x)]] , (2.69) 



where the measure dU(x) is the Haar measure on the gauge group (see Add. IBTT|I . 
One of the properties of the Haar measure is that the total integral over the group space on a single 
lattice point x is finite. Thus, any gauge-fixing procedure is unnecessary |24j . However, if one attempts 
to apply a saddle-point approximation to (|2.69() (see Sec. 12.3.3^1 . the presence of zero modes will spoil 
the inverse of the two-point function |35) . This requires the introduction of a gauge fixing procedure 
and auxiliary fields known as Faddeev- Popov ghosts [35] . 

It is important to point out that the entire partition function l[2.69[) is manifestly gauge invariant since 
it is composed of gauge invariant loops U fJ , ll (x) only. It can be shown [461 124] that it is impossible to 
break gauge invariance spontaneously. This fact implies that the expectation value of the U^x) will 
always vanish, a fact which is also known as the Elitzur theorem: 

(U»(x))=0. 



33 



2 Quantum Field Theories and Hadronic Physics 



Eq. 1(2.67(1 has two limits, where the form of the gauge fields can be written down explicitly: If one 
considers (3 — > 0, the situation resembles the hot temperature limit in thermodynamics. Thus, the 
resulting gauge field configurations are called hot configurations the values of the gauge field variables 
are arbitrary and they can take any random values from their domain of definition. The limit /3 — > 
oo, is referred to as the zero-temperature limit. The correlation between neighbor points increases, 
therefore the total correlations length will increase. Finally, the values of the field variables are given 
by U ll (x) = 1 Vi € O. If the system undergoes a second order phase transition, i.e. if the correlation 
length diverges, this describes the continuum quantum field theory. 



There exists also a discretization technique entirely different from the methods discussed in Sec. 12.6.21 
It is based on a quantum link model, which is constructed in such a way that it is still locally gauge 
invariant and should thus reduce to the correct continuum form of the Yang- Mills theory, just as 1(2.67(1 
is expected to do. Quantum link models with local gauge invariance have been formulated by HORN in 
|M] for the first time, where a model with a local SU(2)<X>U(1) gauge invariance has been formulated. 
They have been extended to the case of SU(7V) by Orland and Rohrlich in These models, 

however, did not yet relate to the quantum field theories with continuous symmetries as discussed in 
this thesis. Only recently it has been realized by Chandrasekharan and Wiese in [HSj, how one can 
relate the discrete quantum link models with continuum field theories. For a review of the ingredients 
of quantum link models see jHZ] ■ 

Globally Symmetric Models 

The basic ideas behind the construction of D-theory become clear if one considers a spin model, namely 
the 0(3)-model in two space dimensions. The action is given by (cf. 1(2. 67(1 ^1 



with s(x), x S Zq, being three-component unit vectors. The coupling constant is given by g. After 
quantizing this spin system (cf. Sec. 12.3.3(1 by considering the partition sum 



one arrives at a model which is asymptotically free and has a non-pcrturbatively generated mass-gap. 
The question arises, whether it is possible to find a different lattice system (with no resemblance to 
Eq. 1(2.70(1 ) which still reduces to the same continuum field theory in the sense discussed in Sec. 12.3.31 
This construction is indeed possible and can be done as follows: 

1. Replace the classical vectors s{x) by quantum spin operators S(x) which arc elements of the 
algebra S(x) £ su(2), i.e. they are the generators of the SU(2)-group (cf. Appendix IB. 3(1 . 

2. Replace the classical action ((2.70(1 by the Hamiltonian 



which yields the quantum Heisenberg model. The partition sum (|2.71(l is therefore replaced by 
the state sum 



2.6.3 D-Theory 




(2.70) 




(2.71) 




(2.72) 



Z = Tr exp [-/3iT 



(2.73) 



34 



2.6 Discretization 



It is important to point out that the particular representation of the group is not important 
- the trace can be taken over any representation, although in practice one usually adopts the 
fundamental representation [57] 4 . In the following, the discussion is restricted to the case J > 0, 
i.e. the anti-ferromagnetic Heisenbcrg system. 

3. By using a Suzuki- Trotter discretization, the state sum 1|2.73[) becomes a partition function of a 
three-dimensional model with continuous symmetry with a certain lattice spacing a^i- This model 
is invariant under a global SO(3)-symmetry since the Hamiltonian (|2.72|l is also invariant. The 
low-energy properties of the resulting model can be described using chiral perturbation theory 
|98| . The symmetry is spontaneously broken in the ground state, resulting in two Goldstone 
bosons which are represented by fields in the coset SO(3)/SO(2) = S 2 . Thus, they describe the 
same kind of three-component unit vectors which appear in the original action, Eq. (|2.7U|I . The 
low-energy effective action of the Goldstone bosons can be formulated using chiral perturbation 
theory: 

S[s\ = £° J d 2 x ^ (d„s- d»s+ ^d s- d°s^j , (2.74) 

with Lq being the extend of the third dimension which has been introduced by the Suzuki- 
Trotter discretization. The parameters c and p s constitute the spin-wave velocity and the stiffness, 
respectively. 

4. Finally, there exists a mapping of the two systems, which has been suggested by Hasenfratz 
and Niedermayer j^H] . This is achieved by a block spin transformation, which maps subvolumcs 
of size ft eub = Lq x (I/oc) 2 to a new lattice system. The new lattice will then have a lattice spacing 
given by aid = Lqc and the coupling constant g of the transformed system is given by 

l/g = L p s + O(l/(L p s )). (2.75) 

Thus, the continuum limit of the new lattice model is obtained in the limit Lq — ► 00. The 
correlation length (and thus the inverse mass scale of the system) is given in terms of Lq by 

Z ■ exp (2irL oPs ) (l - —J + 0(1/ (L p s ) 2 )) . (2.76) 



16irp s \ 4:irL p. 

In the limit Lq — > 00, the correlation length thus diverges exponentially and the extent Lq <C £ 
becomes negligible and hence the system undergoes dimensional reduction. 

In conclusion, one can say that D-theory introduces a substructure to the original system. The lattice 
spacing of this substructure is much smaller than the corresponding lattice spacing of the original theory. 
However, the resulting lattice action is obtained from exact blocking of the continuum fields, implying 
that the lattice artifacts are of order ©(03^). This means that in practical simulations, one can use 
lattice spacings of the same order of magnitude as with the Wilson discretization and the resulting 
theory has a lattice spacing a,2d 3> 03d- 

Models with Local Gauge Symmetries 

The construction principle underlying D-theory can be applied to other models as well. The important 
cases of U(l) and SU(2) gauge theories have been discussed in [HSj- The application to the case of QCD 
has been considered in |l()0j . For a review consult 

4 This choice allows one to restrict to the smallest possible Hilbert space 
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In |102| . it has been conjectured how the parameter space of the D-theory formulation is related the 
coupling of the conventional theory. Also the principal chiral model could have been formulated in this 
way and has been shown to reduce to the conventional discretization formulation 103 . From these 
discussions it becomes clear that in fact D-thcory is an alternative formulation of the discretization of 
quantum field theories with local gauge symmetries. 

Simulation Algorithms 

For the simulation of quantum spin systems, a particular efficient class of algorithms is available, 
known as cluster algorithms. While the most efficient algorithms to be discussed in Chapter which 
are applicable to the Wilson action Eq. (|2.67|) are all local, the cluster algorithms are global. 
Cluster algorithms have first been introduced to quantum spin systems by Swendsen and Wang in 
|ll)4| . These algorithms exploit the mapping introduced by Fortuin and Kasteleyn |1(J5| to rewrite 
the partition function and to formulate a global algorithm which is able to flip a large cluster of spins 
at once. In this way, critical slowing down which will be discussed in Sec. 13. 2.^1 is effectively reduced, 
provided the average cluster size scales proportional to the correlation length of the system. For a 
general overview of cluster algorithms see |106| . A useful generalization of cluster algorithms which 
might be applicable to D-thcory is given by the world-line Monte-Carlo algorithms, see |1()71 11081 1109j 
and (HUj for a new implementation. 

If indeed locally gauge symmetric models can be simulated efficiently using a quantum spin system, 
the inclusion of dynamical fcrmions would be straightforward |97j . Thus, full Yang-Mills theory might 
be efficiently simulated. There is furthermore reason to believe, that the fcrmionic sign problem for 
a discussion) may be handled better in the framework of quantum spin systems. For an overview see 
|lllj . For further readings consult |112j . 

This benefit could then be used to overcome the limitations of current algorithms regarding the sign of 
the fermionic determinant. This problem occurs whenever an odd number of dynamical fermion flavors 
is being simulated very close to massless fermion flavors. This point will be discussed in Sees. 12.6.41 
and 15.31 



2.6.4 Fermion Fields 

The Euclidean space version of (|2.54|) is given by 



N 



(2.77) 



where the 7^-matrices in Euclidean space must be employed, cf. App. IA.1I The representation of 
Eq. I|2.77(l on the lattice is a very complicated task. As shown in Add. IB.61 the basic fields ^(x), ty(x) 
are elements of a Grassmann algebra. These fields admit a representation as four-component vectors 
with the choice of {7^} as given in App. IA.1I Thus, the task of putting an A^-componcnt Yang-Mills 
field in Euclidean space, on the lattice is equivalent to finding a matrix Q a b.fj,u(y, x) with a,b = 1, . . . , N, 
/U, v = 0, . . . , 3, and x, y G Zq, giving rise to the action 



5, = -^^ ^(y)Q ab ^(y, x)^(x) . (2.78) 

xy ab^jiv 

To simplify the notation, the indices a, 6, and \x, v will be suppressed from now on. The corresponding 



path integral defining the quantum partition function, Eq. 
Apd. IB~Q|) 



Z 



[dtp] [dip] exp 



detQ, 



, is then given by (cf. Eq. (|B.18|I in 



(2.79) 
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For the discretization of the fermionic action, a number of choices is available. However, the Nielsen- 
Ninomiya theorem [1131 II 14| puts a general limit on any lattice fcrmion action; under some natural 
assumptions on the lattice action, it follows that there is an equal number of left- and right-handed 
particles for every set of quantum numbers. 

This implies that on the lattice the fermion spectrum consists of pairs of fermions and fermion-mirrors. 
Thus, apparently it appears to be impossible to implement the structure of Dirac fermions on a discrete 
space-time. However, one can evade the physical consequences by decoupling the superfluous fermion 
states. In QCD this can be achieved, for instance, by giving the fermion doublers a mass proportional 
to the cut-off a -1 . This procedure, however, does not work in a chirally symmetric model; in fact, it is a 
general consequence of the topological character of lattice theory that there does not exist a regularized 
chiral fermion theory that has the following properties (see for a proof of this no-go theorem |115j ): 

1. global invariance under the gauge group, 

2. a different number of left- and right-handed species for given charge combinations, 

3. the (correct) Adlcr-Bcll-Jackiw anomaly, 

4. and an action bilinear in the Weyl field. 

The absence of the Adlcr-Bcll-Jackiw anomaly displays the fact that the axial Ua(1) current is conserved 
because of the cancellation of opposite-handed species. 

Of course, in the continuum formulation any gauge invariant regularization scheme yields the same 
expression for the axial anomaly. Thus, this should also be valid for the lattice regularization, too. 
Consequently, any candidate for the lattice discretization of gauge theories should reproduce the axial 
anomaly in the continuum limit. Indeed it has been shown in that the Wilson discretization |117| 
does reproduce the chiral anomaly in the continuum limit. The Wilson action breaks chiral symmetry on 
the lattice explicitly thus removing the unwanted doublers from the propagators. The chiral symmetry 
breaking term is actually an irrelevant contribution to the lattice Ward identity, i.e. it is proportional to 
the lattice spacing, a. However, it does not disappear in the limit a — > 0, but rather accounts precisely 
for the anomaly. For a discussion of the phase structure associated with Wilson fermions on the lattice 
consult [TT%] . 

A theorem showing that, under the rather general conditions of locality, gauge covariance and the 
absence of species doubling, the lattice action gives rise to the axial anomaly has been given in (1 191 112U] 
for Abelian gauge theories and generalized to the case of QCD (which can in principle be generalized 
to any non- Abelian gauge theory) in |121) . However, the axial flavor mixing current should be non- 
anomalous. That this is indeed the case has been shown in |122| . The proofs have all been done 
perturbatively on the lattice using the expansion from |123| . 

The problem of representation of chiral symmetry on the lattice has been resolved only recently, when 
it was realized that a solution of the Ginsparg- Wilson relation (GWR) introduced in |124| has an 
exact chiral symmetry on the lattice, as has first been discussed in |125j . The first fermionic action 
which actually satisfies the GWR was the perfect action of |126j . For practical purposes, the solution of 
Neuberger |127II128| is the most widely used today (for a historical overview of the development leading 
to the Neuberger representation, see |129| ). Finally it is important to point out that the theorem in 
|121| also applies to Ginsparg- Wilson fermions thus ensuring that they reduce to the correct fermionic 
action in the continuum limit. 

However, since the numerical effort for the evaluation of Neuberger fermions increases by 1 — 2 orders 
of magnitude compared to Wilson fermions, the calculation with dynamical Neuberger fermions is still 
prohibitively expensive. 

As argued above, the Wilson action breaks chiral symmetry on the lattice with a term of order O(a). 
Thus, the action depends linearly on the cutoff and physical observables might show sizable lattice 
artifacts when approaching the continuum limit. As has been put forward by Sheikholeslami and 
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Wohlert in |130| the cancellation of the 0(a) dependence can be calculated perturbatively up to a pref- 
actor, the parameter c sw . Obscrvablcs computed using this fermionic action with a non-perturbativcly 
calculated c sw indeed show weaker artifacts with an 0(a 2 )-dependence as has been demonstrated in 
e.g. [1311 [132 . This program is also called clover-improvement, since the perturbative correction has 
the shape of a four-leaf clover. Clover-improvement turned out to be useful in a number of studies 
employing the hybrid Monte-Carlo (HMC) algorithm (see Sec. I3.5.2f . When applying it to multiboson 
(MB) algorithms (cf. Sec. 13.5.3^1 . however, the required local staples (see App. Q would soon become 
extremely complicated and the merits of the improvement might become obscured by the increased 
algorithmic demands. 

Since the major focus of this thesis lies on Wilson fermions, a few words about its explicit breaking 
of chiral symmetry are in order. Having no chiral symmetry means that there is explicit symmetry 
breaking by the non-chiral fermion mass. Thus, the physics of spontaneous chiral symmetry breaking 
may be shadowed. In fact, it turns out to be extremely difficult to perform lattice calculations with 
light quarks since the numerical effort increases polynomially in the inverse quark mass |133| . However, 
when performing the continuum limit at sufficiently small quark masses (where the precise meaning 
of "sufficient" can only be given very roughly within xPT JH!); one can afterwards extrapolate to 
the desired quark mass and still be able to extract correct continuum physics from numerical lattice 
simulations. This is the method usually adopted in actual calculations employing light fermions. 
With the conventions used in this work, the Wilson action for a single fermion flavor reads: 

5 f =53*(») t o(». iB )*( :B ). ( 2 - 8 °) 

xy 

where the Wilson matrix Q(y,x) is defined to be 

3 

Q(y, x) = S(y,x)- K^2(u p (y- p){l+j p )S(y,x + p) 
P =a 

+ Ut(y)(l- lp )6(y,x-pj), (2.81) 

with k being a function of the bare mass parameter which is called hopping parameter. Due to the 
anticommutivity of the fermion field, one also has to include antiperiodic boundary conditions in the 
coupling to the gauge field, see (23 f° r a thorough discussion. This usually proceeds by choosing all 
Uq(x) — > —Uq(x), with x restricted to a single timeslice when applying the matrix multiplication with 
H2.81J) . This sign is not explicitly written here. For the local form to be discussed in App.El however, 
it is necessary to treat this factor separately. 

The matrix Q(y,x) in (|2.81fl consists of the local 5-function contribution and a "derivative" term 
containing nearest-neighbor interactions. This is often called the hopping matrix, D(y,x), and can be 
considered to be the lattice version of the covariant derivative in the continuum Dirac matrix Eq. (|2.54() , 
p. The "mass" has been taken to unity and the hopping parameter k has been written in front of the 
lattice derivative term which can be achieved by a redefinition of the fields Thus, the Wilson matrix 
explicitly breaks chiral symmetry on the lattice. As will soon become clear, one can nonetheless recover 
the correct chiral behavior by fine-tuning the n parameter. In terms of the hopping matrix, Eq. I|2.81|l 
can be written as 

Q(y,x) = 5(y,x) - KD(y,x) , 

3 

D(y,x) = ^([/ p (y- i 5)(l- 7p )J(j/,.T + /5) 

p=Q 

+ Ut(y)(l-~/ p )5(y,x-p)). (2.82) 

The Wilson matrix, Q(y,x), fulfills the 75-hermiticity property 

Q f (y,x)= l5 Q(y,x) l5 , (2.83) 
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as can be seen from inspection. This leads to the following properties: the eigenvalues are either real or 
come in complex conjugate pairs. If one takes the determinant of Q(y,x), it can therefore only change 
sign if an odd number of purely real eigenvalues becomes negative. At this point it should also be 
remarked that the total number of real eigenvalues is in the continuum related to the topological charge 
via the Atiyah-Singcr- index theorem |51j . For an investigation of the validity of the index theorem on 
the lattice see [T33IH35] . 

The spectrum of the hopping matrix D(y,x) in Eq. Ij2.82|l has been examined in |136| . For recent 
overviews and results obtained from eigenvalue methods, see |137l 11381 I139| . In general, the following 
picture emerges: For a configuration with (3 = (cf. Eq. 12.67|l '). the spectrum fills a disc centered at the 
origin with radius two (see Fig. I2.8fl . In the small coupling regime, the structure is more complicated 
(consult Fig. l2.9|) : The outer shape of the eigenvalues forms an ellipse which has a large radius of eight 
and a small radius of four. However, four circles with radius two each, centered on the real axis, are 
left out. At intermediate values of (3, one finds spectra interpolating between these two situations: the 
spectrum starts to spread and the holes start to form, but the eigenvalue density is not yet completely 
zero in the holes. Especially, the real eigenvalues tend to populate the bulks for a rather long time 
compared to the imaginary ones (see |134j ). When measuring the lattice spacing in physical units, a, 
one finds that 0(a) effects manifest themselves prominently in the real eigenvalues still lying in the 
holes [El. 

Considering then the complete Wilson matrix, Q(y, x), one finds that the lower bound of the spectrum 
becomes zero if (in the free case) K frcc = 1/8. A derivation of this result for free configurations can also 
be found in [22] • In this case, the Wilson matrix describes massless Dirac fermions. This point is called 
the chiral point and the associated value of k is called the critical value K crit . 

Hence, if (3 increases from zero to oo, the values of K„ it decrease from « crit = 1/4 down to K crit = K froo . 
For practical determinations of Sec. Ed 
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Figure 2.8: Spectrum of the hopping matrix, D(y, x) in Eq. (|2.82|) . in the limit (3 — > 0. 



The Wilson matrix given in equation (|2.82(l is not only non-Hermitian, it is even non-normal, i.e. 
[Q(y,x),Q\y,x)}^0. 

Thus, it cannot be diagonalized by a unitary matrix; however, it is possible to diagonalizc the Wilson 
matrix by a similarity transformation with non-unitary matrices, 

Q = S~ 1 -Q-S. (2.84) 

A further consequence of non- normality is that Q(y,x) will in general have different left- and right 
eigenvectors |140| . a property which should be respected in the definition of matrix elements in terms of 
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Figure 2.9: Spectrum of the hopping matrix, D(y, x) in Eq. Ij2.82(l . in the limit j3 — > oo. 



the eigenvectors |66II141| . In several cases (as it is the case for the sampling algorithms to be discussed in 
Chaptcr|3J), one only needs the determinant det Q. Therefore it is often convenient to use the Hcrmitian 
variant of the Wilson action which can be obtained by replacing the matrix Q{y, x) by the Hcrmitian 
Wilson matrix Q(y,x): 

Q(y,x)=y 5 Q(y,x). (2.85) 
It is easy to show that Q{y, x) is in fact self-adjoint: 

= lhQ{y,x) = Q(y,x). 

The spectrum of Q(y, x) is more complicated than the spectrum of Q(y, x) and determining the sign of 
the determinant is a non-trivial task. Exploiting the fact that det 75 = 1, one can, however, always use 

det Q = dot Q . 
Even-Odd Preconditioning 

A simple transformation allows the Wilson action to be rewritten [1421 1143] such that the condition 
number is reduced. To do this we divide the lattice into two distinct subsets of "even" and "odd" 
coordinates: 

Even-odd splitting: If the coordinates of a given lattice site are given by (t,x,y, z) € then a point 
belongs to the "odd" subset iff 

(t + x + y + z) mod 2=1. 

Otherwise they belong to the "even" subset. 

If we rearrange the components of the vector in Ij2.81|l in such a way that the color spinor is given 
by ((/'even, 0odd) with the first half being "even" sites and the second half "odd" sites, then the Wilson 
matrix l|2.81|l takes the following shape: 

Q(y,x) = ^(_ K 1 Doe -f"). (2-86) 
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Using the Schur decomposition |144| 

dct ( r n I = dct A dct ( D ~ CA ~ lB ) . ( 2 - 87 ) 



s C D / 

one arrives at the preconditioned action 

Q = 75 (1 - K 2 D oe D eo ) . (2.88) 

Since this matrix has the same determinant as (|2.81|) it yields the same action <|2.80ll . However, the 
smallest eigenvalue is about a factor of two larger, making the inversion simpler. On the other hand, 
l|2.88(l has a more complicated shape (it now contains next-to- nearest neighbor interactions). Therefore 
the total effort for a matrix multiplication stays the same, but the memory requirement for a color 
spinor has been reduced. 



2.6.5 Yang-Mills Theory 

Finally, one can write down the total discretized form of the continuum Yang-Mills action whose La- 
grangian is given in Eq. i|2.57[l : 

x i±v ^ ' xy 

= S' g + lndetQ(y,a;), (2.89) 

with Q(y,x) being the Wilson matrix (|2.81|l . The bare mass parameter n appearing in Q(y,x) refers 
to the contribution of dynamical sea quarks (i.e. the virtual quark loops). It is therefore termed K aoa . 
The evaluation of the determinant becomes increasingly difficult as K aoa approaches K crit , whose precise 
value can only be determined non-perturbatively (see Sec. 16. 2(1 . Since the evaluation of the determinant 
of such a huge matrix is highly difficult, it is sometimes being set equal to one (which corresponds to 
K Bca = 0), resulting in the fermionic contribution to (|2.89|) being totally absent. If this is only being 
done in the generation of configurations (i.e. the ensemble is sampled with the pure gauge action) 
this amounts to removing the contributions of sea quarks. This defines the quenched approximation 
mentioned in Sec. I2.5.fl 



41 



2 Quantum Field Theories and Hadronic Physics 



42 



3 Numerical Methods 



In this chapter, the numerical methods are introduced which are required to simulate lattice gauge 
theories with and without dynamical fermion contributions. 

The properties of Monte-Carlo algorithms are introduced in Sec. 13.11 They make use of Markov chains, 
as it will be explained in Sec. 13.1.11 

Section 13 . 21 introduces into the subject of time scries analysis, in particular the analysis of autocorrela- 
tions in a Monte-Carlo time series. 

The measurement of hadronic masses is discussed in Sec. 13.31 

The particular algorithms used for the Monte-Carlo integration scheme are given in Sees. EP1 and l3~51 
The former concentrates on the algorithms required for scalar and gauge fields, while the latter intro- 
duces algorithms applicable to simulations with dynamical fcrmionic contributions. The most important 
bosonic algorithms are the Metropolis algorithm fSec. I3.4"TJ| . the hcatbath algorithm fSec. I3.4^2"|) and 
the overrelaxation technique fSec. 13.431) . 

The algorithms for sampling contributions of dynamical fermions are treated in Sec. 13.51 First the 
general problems one encounters when evaluating the determinant of the Wilson matrix are introduced 
in Sec. 13.5.11 It will become clear, that any algorithm dealing with the fermionic determinant requires 
the inversion of a large matrix describing the contribution of the discretized fermionic degrees of freedom. 
Then the most widely used algorithm for the simulation of dynamical fermion flavors, the hybrid Monte- 
Carlo (HMC) algorithm, is reviewed in Sec. 13.5.21 

In this thesis, however, a more advanced algorithm for this subject will be used, namely a variant of 
the multiboson (MB) algorithms. This class of algorithms is discussed in Sec. 13.5.31 These algorithms 
are able to overcome several limitations and shortcomings of the HMC, but at the cost of far more 
complexity. 

As has been mentioned above, matrix inversion is an essential tool for the implementation of fermion 
algorithms. The tools required for the implementation of matrix inversion algorithms are described in 
Sec. 13.61 The inversion algorithms presented are static algorithms in Sec. 13.6. II the Conjugate-Gradient 
iteration (Sec. EES), the GMRES algorithm fSec. l3~0|> . and the BiCGStab scheme fSec ETOjl . 
Finally, the tools for the computation of eigenvalues of matrices arc shortly reviewed in Sec. 13.71 They 
are important for the application of static matrix inversion schemes and thus for the implementation of 
multiboson algorithms. 

3.1 Monte-Carlo Algorithms 

The path integral definition introduced in Sec. 12.3.31 allows for an evaluation using ensembles of field 
configurations as discussed in Sec. 12.3.41 This definition, however, requires to perform an integration 
on an infinite space of operator-valued distributions {</>} with a given probability distribution p{(f>} and 
a measure [d<p]. An approach different to the reformulation in terms of Gaussian integrals discussed 
in Sec. 12. 3.^1 is the application of a numerical integration using a Monte-Carlo scheme. That such an 
endeavor can indeed yield physical results in quantum field theories was first demonstrated in [1451 . 
In this section it will be demonstrated how an algorithm can be designed in such a way that it generates 
a finite set of independent gauge field configurations which can be used as an estimator to the ensemble 
averages and thus to the path integral (|2.40|) . 

A Monte-Carlo integration algorithm is an algorithm which computes a finite set of mesh points and 
yields a statistical approximation A w (^4) to the given problem. The error of the approximation is 
given by the statistical error of the integration scheme. To be specific, let's consider an algorithm which 
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generates a finite sequence [0„], n = 1, . . . , N, of N statistically independent configurations. These must 
be distributed according to the probability density p{4>} of the underlying ensemble. The finite sequence 
is called the sample of configurations. If they have been taken randomly from the ensemble l|2.46[) . the 
sample average 

1 N 

A=-^e- s ^U[0 n ] (3.1) 

71=1 

is an estimator for the ensemble average (A) with an error given by the variance of the statistical 
estimate (12.41) [) . Consequently, the error of the Monte-Carlo integration behaves as 1/^/N. This is 
different from the standard integration schemes like Simpson's rule 02] whose error behaves as N~ 4 / d , 
with d being the dimension of the underlying space. Obviously this method is better for low dimensions 
(for d < 8) and worse for higher dimensions (d > 8). There are better algorithms than Simpson's rule, 
but none is competitive with Monte-Carlo integrations in very large dimensions. On the other hand, 
no Monte-Carlo integration is competitive with deterministic algorithms at lower dimensions. 
It is obvious how to generalize Eq. i|3.1[) if the sample configurations have been drawn from the canonical 
ensemble (|2.48|) . Then the estimator is given by the sample average 

1 N 

n=l 

Since such an integrand may be peaked rather narrowly around its average value, the sampling algorithm 
should generate only the relevant contributions. Such a procedure is called importance sampling. 
In the following, the theoretical basis needed to design Monte-Carlo algorithms from Markov chains is 
founded. 



3.1.1 Markov Chains 

An important concept for the design of an algorithm yielding the desired sample of configurations is 
the Markov chain: 

Markov chain: A Markov chain ([0], p[<j>], V) consists of a set of states [<fi n ] defined on a base space. For 
the purposes in this thesis, this is the space of discretized fields, Z^. A specific element 4>i+i is 
generated from the previous element <fn by a stochastic process V: 

4>i+\ = Vfa . 

The associated transition probability is given by the matrix element P ([</>«] — » [^i+i]). It solely 
depends on the state The Markov density p[(j)] is a unit vector in the state space spanned by 
all 4>, in which the matrix V ([■] — > [■]) acts. 

If the states [</>„] have the probability distribution p [</>], applying V once to the end of the chain may 
change the probability distribution. With the initial distribution given by p n [<j)\, one obtains a new 
distribution p n +i [</>] via 

Vp n [4>] = J2 V = ' 5 «+ 1 ^ ■ 

Using this language one can define the following notions related to Markov chains: 

Irreducibility: Denote 4>j = V^ M ^4>i f° r M repeated applications of V on <fii, yielding <fij. A chain is 
called irreducible if for any states £, £ € Z^, there exists an M > such that 
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Aperiodicity: Define p^ 1 ' — V ([fa] — > [<fo+i]) . . .T 5 ([<^-i] — > [fa]) to be the M-step transition prob- 
ability to reach <pj from the starting clement fa in M steps. A chain is called irreducible and 

aperiodic if for each pair fa, fa € there exists an Mq = Mq (fa, fa) such that p\j > for all 
M > M {fa, fa). 

Recurrence time: Take a state £ G Zq. Let (7 , ([£] — ► [£])) be the probability to reach £ after M 
applications of V on £. Then the mean recurrence time is given by 



M=l 



Positivity: A state £ G Zq is called positive iff is finite. 

Stationary distribution: A probability distribution p[<^>] is called stationary distribution of the Markov 
chain if it stays invariant under application of V: 

p[fa = Vp[fa . 

A particularly important class of Markov chains is given by the irreducible, aperiodic chains whose 
states are positive Indeed one can prove the following theorem: 

Existence and uniqueness of the stationary point: Take an irreducible, aperiodic Markov chain with 
positive states [<f> n ] and transition function V . Hence, the chain has the starting distribution /5[</)„]. 
Then the limiting probability distribution /5 Eq [fa , 

p m [fa=Vp B M = Jim V^p^n], 

exists and is unique. It is thus a fixed point of V . 

From now on we will only consider Markov chain with this property. The transition probability 
V ([•] — * [•]) has to be normalized, i.e. for all £ <G 1^ the following equation must hold: 

wo 

Now we can generate the desired sample of field configurations [cj> n ] as the states of a Markov chain by the 
repeated application of V on the last state <f> n of the sample thus generating new members of the sample 
and improving the approximation of Eq. However, we must ensure that the transition probability 

is designed in such a way that the samples are taken from the desired ensemble of configurations, i.e. the 
density of the sample, p[fa\, must equal the density of the ensemble, p{fa\. 

Since the application of V on a state </> may change the probability density p[fa\, we have to design the 
process in such a way that the stationary distribution of the Markov chain is given by the ensemble 
density of the ensemble under consideration: 

V~p^ \fa= ~pvM = pW • (3.4) 

Knowing from above that the fixed point exists and that it is unique, one can formulate the following 
sufficient (but not necessary) condition for the transition probability V (\4> n ] — * [0n+i]) of the transition 
matrix V: 

V ([fa- l+1 ] -> [</>„]) p[<t> n ] = V ([</>„] -> [fa- l+1 ]) p[fa +1 ] . (3.5) 
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Summing on both sides over the complete state space 4> n and using (|3.3|) one arrives at 

V ([0n+l] -> [<£n]) P[K] =^Z V d&n] ~> [<?Wl]) P[K+l] = P[4>n+l] , 

{<M {<M 

which is identical to Eq. (|3.4|) . Relation Ij3.5|l is known as detailed balance. It does not determine 
the transition probability uniquely and thus one can design different algorithms sampling the field 
configurations. However, since Eq. (|3.4|1 is not a sufficient condition, it may happen that the algorithm 
gets "stuck" in a local maximum of the density. Such a situation is difficult to detect and even more 
difficult to handle. The only way to proceed in such cases is by using multicanonical sampling (see 
Sec. I2.3.4f) . If one manages to find an action S [4>] which no longer has several distinct local maxima, 
this problem is avoided. A typical situation where this may happen is if the system is close to a first- 
order phase transition, where the system has comparable probabilities to exist in cither one of two 
different phases |33| . 

For an infinitely long Markov chain, wc define the mean value (A) by 

A = Y^pMM<t>i] , (3-6) 

i 

with /5 Eq [</>] being the stationary distribution of the Markov chain. Then the mean value A coincides 
with the expectation value (A) from Eq. (|2.4U|) . For a finite sample [<p n ], n < oo, the estimator A 
approximates (A) with an error of order s/N as discussed above. 



3.2 Autocorrelation 

Although the Markov chain generates a new state only from the previous one without any knowledge 
of older states, the new state may be rather similar to the old one. Thus, the sample of configurations 
generated as states of the Markov chain will in general not be statistically independent. The correlation 
in the sequence of generated configurations can be made mathematically precise using the autocorrelation 
function of a time series. In the following Ai denotes the measurement of A [</>,;] on a configuration <f)i. 
The time series then consists of the set of {Ai}, i = 1, . . . , N. 



3.2.1 Autocorrelation Function 

The autocorrelation function is defined by 

C aa (t) = (A t A t+T ) - {{A t )f , (3.7) 

where the average of the infinite series is denoted as (•). The set of states underlying Eq. (|3.7|) is infinite. 
However, as already noted above, in practical calculations one deals with finite samples and therefore 
is unable to compute the exact averages but only estimators. The estimator based on a finite sample 
of length N for (|3.7|1 is given by 

M-T 

The autocorrelation function with r = is the standard deviation of the series. The normalized 
autocorrelation function Y aa(t) is defined by 

Taa(t) =C aa (t)/C aa (0). (3.9) 
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3.2.2 Exponential Autocorrelation Time 

One important information the autocorrelation function yields is the time the system needs to equili- 
brate, i.e. the time needed until the system goes from an arbitrary starting point p[(f>] to the stationary 
probability density /3 Eq [</>]. To study this behavior, let /5 inter [</>] be a given probability measure on Zq at 
an arbitrary intermediate state taken from the Markov chain and /5 Eq [<fi] the equilibrium distribution of 
the Markov chain. Let l 2 (4>) denote the Banach space of complex- valued functions f{p[4>\) on the state 
space TL\i having finite norm 



U 2 (0) 



E/wiww <«>■ ( 3 - 10 ) 



The inner product in this space is given by 

(f\g)= E riPWPgW])- (3-ii) 

Then we define the deviation of p intBI [4>] from |5 Eq [<fi] by [1461 1441 I133| : 

d 2 (/W [</>], P Bq [0] ) = 1 1 Antcr [</>]" /5 E q [</>] 1 1 



sup 

lli2 W : 



(3.12) 



In general, the transition V for an irreducible, positive-recurrent Markov chain has the following prop- 
erties: 

Contraction: The spectrum of V lies in the closed unit disc. Consequently, V is a contraction. 

Eigenvalues of the stationary distribution: The eigenvalue 1 of V is simple. The operator V* has the 
same properties. 

Uniqueness: If the chain is aperiodic, then 1 is the only eigenvalue of V (and of V*) on the unit circle. 
The eigenvector is the unit vector in Zq. 

If Pinter [</>] nas been obtained from a starting distribution p st „ t [<p] by a single application of V, it follows 
that 

d 2 (Vp^M,p^M) <\\v\ OW^L/W^]) ■ (3-i3) 

The spectral radius formula |146| yields: 

\\V \ l- 1 ]! oc R := cxp (^— . (3.14) 

Thus, R is the spectral radius of V on the orthogonal complement of the identity, i.e. the largest 
modulus of the eigenvalues of V with |A| < 1. The definition of r oxp , Eq. H3.14(l . maps the spectral 
radius R £ [0, 1[ onto r oxp £ [0,oo[. Hence, a scale in the Markov chain has been introduced. After M 
applications of V one arrives at 

d 2 (v^p^M,P E M) < exp . (3.15) 

The meaning of t cxp is that of a relaxation parameter. The number of steps required for the system to 
reach the fixed point distribution starting from an arbitrary distribution is characterized by this time 
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scale. It may happen that r oxp even becomes infinite In such a case, one can never reach the 

equilibrium by starting from an arbitrary configuration in finite time. 

To actually compute r oxp for a given algorithm, one must find a good test function, i.e. an appropriate 
observable in l|3.9|) with sufficient overlap to the slowest mode of the system. Thus, one can define r oxp 
via 

T 

t oxp = sup lim i-T , (3.16) 

{A} T ^°° _ lni AA(l") 

where several different observables A must be considered. Of course, in practice one can never be sure 
that the slowest mode of the system is captured by the set of observables chosen. 

In practical situations, however, one does not work with the total density vector p[<f>] of the system, but 
rather one considers only the finite sample of configurations obtained by repeated application of V to 
a single starting configuration. The probability of this configuration in the equilibrium density /5 Eq [</>] 
may be rather small, but it cannot be zero. The way to estimate a given density vector in the state 
space of the Markov chain is then to histogram an observable and examine its distribution. For a gauge 
theory on the lattice this could e.g. be the gluonic action. Unless the system hasn't thermalizcd, the 
histogram will still change its shape when adding new configurations. 

For the starting configuration it is common to either use a homogeneous set of variables, the cold start, 
or a set of random variables, the hot start. 

3.2.3 Integrated Autocorrelation Time 

Once the Markov chain has reached the equilibrium density, there is still an autocorrelation between 
subsequent measurements. This autocorrelation can be assessed by considering the integrated autocor- 
relation time, T int . For an observable A, the latter is defined via |44l I133| 



1 OO 

-J2 r AA(r'). (3.17) 



The factor of 1/2 in l|3.17|l is a matter of convention. It ensures that rA t w r^ p if Faa^) — exp(— \t\/r) 
for r> 1. When applied to a finite sample of lengths N , one obtains an estimate via 

- 1 N 

t'=1 

Tj^ t characterizes the statistical error of an observable A. This can be seen by considering the variance 
cr(A) of the mean Ij3.1jl : 

1 N 



N 2 



N ^ \ N 

t=-(N-l) 

N * T ^(2r£)C AA (0). (3.19) 

Thus, the error in case of stochastically dependent configurations is decreased by the factor 2r^ t if 
autocorrelations are present. It is obvious, that the integrated autocorrelation time will in general 
depend on the observable A, meaning that some quantities are harder to measure than others from 
finite samples. This also depends on the algorithm underlying the Markov chain, i.e. on the choice of 
the transition matrix V . 
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As it is discussed in |133j . the autocorrelation function C aa {i) may be composed of several different 
exponentials. The fast decaying modes lead to a decrease of the contribution of the slower modes in the 
integral. Therefore, obscrvables with only a small overlap on the slowly decaying modes will usually 
exhibit a smaller T A t than those dominated by the slower modes. Large spatial correlations on the 
lattice may induce modes in the autocorrelation functions which are also large (since the information 
has to propagate a larger distance through the lattice along the Markov chain). This results to the 
fact that large correlation lengths which one encounters for smaller masses exhibit larger integrated 
autocorrelation times — a result which was clearly visible in the samples contained in . 
Recalling that r oxp is associated with the slowest mode in the system, one concludes that r A t < r A p 
for any obscrvables A. This can also be shown by considering again the spectrum of V . If detailed 
balance holds, V is self-adjoint on the space l 2 (<p). Hence, the spectrum is real and lies in an interval 
[A mi „, A m „ x ] C [-1,1] with 

A min = inf spec (V \ l x ) , 

A max = sup spec (P \ l x ) . (3.20) 
Using the spectral radius formula (|3.14() again yields 



° xp lnA m « 

where the slowest mode is associated with A max . By considering an estimator r A p for t cxp , one can write 
it in form of a spectral representation 

Taa{t) = [ X^da A (X) . (3.21) 

The largest and slowest modes contributing to t a have been denoted by A^ in and A^ ax . They form a 
subinterval of [A min , A max ]. Summing l|3.21|l over r one finally arrives at 



1 / >A — 1 + A , 1 f x ™* l + X A 



da A (X)<-l -f^da A (X) 



This leads to 



2 I 1 -exp(-l/rA,) 



3.2.4 Scaling Behavior 

As has been discussed in Sec. 12.3.31 a quantum field theory usually will undergo a second order phase 
transition as the continuum limit is approached. This implies that the correlation length, £, associated 
with the system diverges. This divergence claims an increase in the lattice size, L, and usually also 
means that the autocorrelation time increases rapidly. This phenomenon is known as critical slowing 
down. In particular, the autocorrelation time diverges as |44| : 

t oc min(£,£) z , (3.22) 

which defines the dynamic critical exponent z. Critical slowing poses a problem for the numerical 
simulation of dynamical systems since especially the critical points are points of major physical interest. 
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3.2.5 Short Time Series 

When using Eq. (|3.18(1 to estimate r£ t for an observable on a finite time series, one still needs a sufficient 
amount of measurements. The particular problem is that large r values of Caa(t) will have large noise, 
but only small signals since the function does approach zero while the errors don't |44j . To be specific 
the error can be computed using the approximation r -C M -C TV: 

(*£.)'■ (3.23) 

If the sum in H3.18|) is cut off at a point M < N (introducing a "window" of size M), one obtains 
f£(M) via 



1 M 

f i i(M) = -'£r AA (T>). (3.24) 



t' = 1 



The trade-off is that by using (|3.24(l . one introduces a bias 

biased) = -\ J2 r A A(T') + o(±) . (3.25) 

\t'\>M ^ ' 

Thus, the bias will only be a finite- length effect of the time scries which will vanish once the series is 
long enough. 

The choice of M should be guided by the desire to make cr(f£ t ) small while on the other hand still 
keeping the bias(fv^) small. 



Windowing Procedure 

One way to choose the window parameter M is to apply the following recipe |44II133| : Find the smallest 
integer M such that 

M>c?.i(M). 

If T A a(t) was a pure exponential, then it would suffice to take c as 4. This implies that Taa(t) would 
have decayed by 98% since e -4 < 2%. However, if Taa{t) does not show a clear exponential behavior, 
then one has to consider c ~ 6 or still larger. For time series of the order of N « IOOOt this algorithm 
works fine |44|. however it is not clear how stable this procedure is for much smaller samples. Sadly, 
in the numerical simulation of Euclidean field theories, one usually only has N us (100 — 200)t or even 
less, so this method alone is insufficient for obtaining a reliable estimate of r£ t . 



Lag-Differencing Method 

A typical indicator of a systematic bias might be that the autocorrelation function does not converge 
to zero but rather approaches a constant before dropping to zero in a non-exponential manner. It 
could also be that the autocorrelation function exhibits linear behavior. Being conservative, one would 
conclude that in such a situation the time series is simply too short to give answers and that there is 
no way to extract further information from it. If one is more practical, one may try to extract only the 
exponential modes from the series and discard the linear behavior. This is what differencing docs. In 
|133j . this new method for eliminating, or at least reducing the bias H3.25(l of the time series has been 
suggested by Lippert. The idea is to apply a differencing prescription to the original series in order to 
reveal the true autocorrelation behavior. This approach is justified, because once the Markov density 
p\4>] becomes stationary, the system will be unaffected by a shift in the time origin. 
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Define the order-fc-lag-/-diffcrenccd time series by 

(D[ k) A) t = {D\ k - l) A) i+l -{D\ k - l) A) u 

= Ai+i-Ai. (3.26) 

Examining the estimator for the average ((Dj A) s ) shows, that the first-order-differenced series indeed 
goes to zero: 

1 N-t-I 
t' = 1 

One possible way to apply definition (|3.26[) is to examine the correlation between the original series 
{Ai} and the order-1 differenced series {(D\ , A)i}: 

N-t-I 

C a^a^) = J^-i E (A t -(A ))((D^A) T , +T -((D^A) T )) 

t' = 1 

N ^°° C aa (t)-C aa (t + 1). (3.27) 

A constant bias will be removed for I > , while the modes with scales below I should not be affected. 
However, when choosing I too small, Eq. (|3.27|) will destroy also exponential modes larger than I. On 
the other hand, the procedure will be ineffective if I is too large since the statistical quality of the 
sample will get worse. For this reason, we also believe that higher order differencing will not be useful 
for practical purposes. 

In practice one has to examine the autocorrelation function for a number of different lags. In the ideal 
case, a plateau should form when plotting the estimated value for r£ t from Eq. Ij3.27|) vs. the lag I. 
This fortunate case is, however, only rarely given since one would not need to apply the differencing 
procedure in the first place if the statistics were good enough. 

The recipe to apply this procedure which is used in this thesis consists of the following steps: (i) Get 
a first rough estimate about the autocorrelation time f£ t . This may be obtained by comparison to 
different time series or by the other methods for computing autocorrelation times, (ii) Vary the lag I 
and measure a the function r£ t (l) for the different lags, (iii) If the function exhibits a plateau with 
I > t.^, the estimate for rA t (l) at the plateau is taken. If no plateau is formed even when going to 
I > 2fr^ t , the method fails to give any reasonable answer. 



Jackknife Method 

As an independent consistency check, one can also exploit relation (|3.19|) to obtain an estimate for r^ t . 
The method discussed in the following is called Jackknife binning and allows to find the "true" variance 
of a sample. In addition, it allows to estimate the variance of "secondary quantities" , i.e. a function 
obtained from the average of the original sample. In the context of quantum field theories, secondary 
quantities are given by observables which are defined to be expectation values and thus require an 
averaging over the ensemble. 

Reference |147) contains an introduction to the Jackknife procedure; for a complete discussion and 

further applications consult Ref. [148) . 

The Jackknife method consists of the following steps: 

1. Choose a block size B > 1 and partition the scries in a number of blocks of size B. The total 
number of blocks is then given by M = N/B. In the following it will be assumed that all blocks 
have equal size (if B is not a divisor of N, one can simply make the last block smaller; this has 
no practical influence). 
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2. Define the averages {Aj '}, j = 1, . . . , M , by 



\ n=l n=A r 2 + l / 



with iVi = B(j — 1) + 1 and N2 = jB. Thence, Aj is the average of the sample {Ai} with the 
jth block of size B (ranging from Ni to N2, included) being left out. 

3. Then define the Jackknife estimator for the average and its variance for bin size B by 

1 M 

n=l 
AT 1 M 

- ^£(4 B) -^ S) ) ■ (3.29) 

n=l 

4. Repeat the above procedure for different values of B and take the limit B — > 00. The corresponding 
value of (Tb— >oo(^4) = c(^4) is the true variance of the sample. In practice, one has to plot the 
variance <7b vs. the bin size B until a plateau emerges. The resulting plateau will then give an 
estimate of the true variance. However, in general the resulting variances will fluctuate strongly, 
making a precise determination impossible. The best one can do is then to take the average value 
of the plateau as an estimate and the fluctuations as the errors on the variances. 

After knowing the true variance, the integrated autocorrelation time can be estimated by 

?i = \(^k-X- (3-30) 

This approach, however, only allows for a crude estimate of r^, since one has no systematic control 
of the error (see above). This has to be contrasted to the autocorrelation function where one can use 

Eq. jnmj. 

The generalization of the Jackknife method to secondary quantities, i.e. functions of the sample average, 
f({Ai}) 7 is straightforward. Starting from the averages defined in l|3.28|l . one defines the functions /j 5 - 1 
of Aj B ^ and their variances analogously to Eq. I|3.29|) by 

1 M 

/ (B) - ^£/(4 fl) )> 

n=l 

AT 1 M 2 

71=1 

With the obtained variances, one can proceed as before and apply (|3.3()(l to get the autocorrelation time 
of the secondary quantity. 

The Jackknife method is applied in this thesis both to obtain an independent estimate of the autocor- 
relation time and to obtain the true variance and thus the true error of both primary and secondary 
quantities. 
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3.3 Measuring Hadron Masses 

In order to measure hadronic masses on the lattice, one needs to compute correlation functions of 
operators carrying the same quantum numbers as the hadron under consideration. For general reviews 
see 0B| 11491 1321 |2~4"| and |150| . On the lattice one has again a certain freedom for the construction of 
these operators. In this thesis the simplest operators are taken in accordance with [149) . For instance, 
in the case of the charged pion and rho-meson (cf. Eqs. (|2.2J) and l|2.3|l ). one obtains 

$ n+ (x) = ]T<i a (x) 75U >), 

a 

<l> p+ (x) = ^2d a {tx)fu a (x) , (3.32) 

a 

where d a (x) is the ci-flavored quark field with color index a, and u a (x) the u-flavored quark field, 
respectively. 7 means that summation over the three spatial 7i-matrices has to be performed. 
As discussed in |149) . one can use the Kallen-Lehmann representation of two-point functions in the 
Euclidean region to derive the mass formula. In the case of a scalar field this is done via 

/>oo 

(n\7{<f>(x)<f>(y)} |0) = / dm 2 p(m 2 )A E (x - y; m 2 ) , 

J ml 

where the spectral weight function is positive and has the shape of a <5-peak for single-particle states. 
The Euclidean propagator, Ae(x — y; m 2 ), is given by 

d 4 k expftfc^a^ - y^)] 



A E (x- y;m 2 ) = j ■ 



(2tt) 4 m 2 + k v k v 

Integrating over three-space yields a single "time-slice" , defining the correlation function 



rv(ti-t 2 ) = I d 6 x(n\7{4>(h,x)<j>{h,y)}\ci) 

dm p(m 2 ) exp[— m(ti — £2)] • (3.33) 



m 

For large time separations, t\ — t<i — ► oo, the lowest mass state mo dominates. 

In order to extent this construction to fermionic correlation functions, one needs the generalization of 
i|B.18() to arbitrary integrals of the Gaussian type (see [23 for a mathematical derivation): 

[dT7 t ] [d.77] exp [-77U77] rjj^l . . . Vj M Vj M 
(xdetA e^(A-% lll ...(A-% MlM , (3.34) 

kx...k M 



with 



J1---JM 



1, where ki . . . ku is an even permutation of ji . . . Jm, 

e i\ .Am = \ ~ where ki . . . ku is an odd permutation of j\ . . -jia, and 

0, where k\ . . . fcjvf is no permutation of ji . . . ]m ■ 



The sign factor from Eq. (|B.18(1 has been dropped. The evaluation of a hadronic matrix clement thus 
requires one to recourse to the fermionic matrix, Q(y,x). The bare mass which enters here is related 
to the valence quark content of the hadron in question and is therefore termed K val . It is therefore 
possible, as already argued in Sec. 12.5.31 to choose the valence quark mass appearing in the hadronic 
operators different from the sea quark mass appearing in the measure which is used for the sampling 
process, Eq. (|2.89|) . In fact, for quenched simulations this is a necessity to derive hadronic masses. See 
Sec. I2.5.3l for a discussion of these methods. 
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A meson correlation function 



(3.35) 



for large Euclidean times will then yield the desired mass. However, on a lattice with a finite extent, one 
has to take into account finite size effects. Since periodic boundary conditions are usually implemented, 
the lattice correlation function will be symmetric and the lattice masses will have to be extracted using: 



where the temporal lattice extension is taken to be L t . The case of baryons is more involved, however. 
See for the latest methods and results |148| . 

Combining Eqs. H3.32(l . (|3.34J) . and (|3.35|l . the correlation function for the pion is given by 



Fitting the resulting (t) to (|3.36(l for large values of t will then yield the lattice pion mass, (am^+). 

3.4 Bosonic Sampling algorithms 

The task of this section is to describe several algorithms realizing a Markov chain for the field configura- 
tions 4>i £ Zq. Any algorithm should thus generate a new configuration (f> i+ i from a given configuration 
satisfying ergodicity and detailed balance Eq. H3.5(l . Once the new configuration <fii+i has been generated 
by updating all degrees of freedom (d.o.f.), one denotes this procedure as a single sweep. 
In general, one can divide the algorithms into two different classes: 

Local algorithms: The local algorithms consider a subset 3 C of sites — usually only a single site at 
a time — and change this point according to a certain prescription. Then a different subset will be 
considered until the whole space & has been processed at least once. There is no global decision 
taking place on the lattice. Usually local algorithm are constructed such that they satisfy detailed 
balance and ergodicity locally, thus ensuring that the total sweep also satisfies these properties. 

Global algorithms: All sites are being updated at once according to a prescription not depending on 
any sublattice or subset. These global update algorithms usually induce larger autocorrelations 
than the local ones since the changes which can be applied to all sites at once will only be small 
compared to a change which can be applied at a single site only. 

There are also several hybrid forms of algorithms. The multiboson algorithms discussed in this thesis 
are usually a mixture of several local sweeps combined with a global step. Furthermore, local forms of 
the multicanonical algorithms |151l I152| , also may require the evaluation of the global action. 
To estimate the dynamical critical exponent for a local algorithm, one has to remember that in a 
single step the "information" is transmitted from a single site to its neighbors 01] ■ Consequently, the 
information performs a random walk around the lattice. In order to obtain a "new" configuration, the 
information must travel at least a distance of £, the correlation length. Therefore one would expect 
r cx £ 2 near criticality, i.e. z = 2. 

The potential advantage of global algorithms is that they may have a critical scaling exponent smaller 
than for local algorithms. This can be attributed to the fact that since all sites are update at once, the 
information need not travel stepwise from one lattice site to its neighbor, as it was the case for a local 
algorithm. 



r m (i) = exp[— t(am m )\ + exp[— (L t — i)(am m )] , 



(3.36) 




(3.37) 
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3.4.1 Metropolis Algorithm 

The Metropolis algorithm has been introduced in |153j . It can be implemented both locally and globally 
and has the following general form which has been formulated in |154II155] ): The transition probability 
V ([fa] — » is the product of two probabilities V — Va ■ Vc where 

1. Vc([fa] — * [^i+i]) generates a given probability density for the proposed change of the config- 
uration. A convenient choice may be that 4>i+i is taken from the random ensemble Eq. (|2.46() 
independent of fa. 



2. The transition probability Va is then given by 



t> rui u n • (a v c([fa+i] -> [fa])p Eri (fa+i) \ , . 

V A ([fa] - [fa +1 ]) oc mm ^1, ^ [<t>i+i])hM ) , (3-38) 

where p E ^(fa) is the probability of fa in the equilibrium density of the Markov process, p Eq [fa. 



Local Metropolis Update 

As an example we consider a lattice with field variables fax), x G Zjj, which can take on continuous 
variables from the interval [a, b], a, b <E E. The task is to design a Markov process which generates field 
configurations distributed according to a canonical ensemble, Eq. I|2.48[l . i.e. according to exp [— S[4>]], 
where S[fa is a multiquadratic action as discussed in App. El A simple algorithm which implements 
the local Metropolis update sweep is designed as follows: 

1. For each lattice site y compute the local staple AS[Afay)] corresponding to fay). For a definition 
and actual computations of such a staple see App. [U] 

2. Suggest a randomly chosen new field variable fa(y) from [a, b] with staple AS[Afa[y)], Afay) = 
fa(y) — fay). Accept the new variable fa{y) with probability 

min (l, exp[AS[Afay)} - AS[Afa(y)]]j , (3.39) 

otherwise keep the old value fay). 

3. Iterate step 0a number of times. 

4. Continue to next loop in item^ 

Afterwards, the entire lattice will have been updated. This algorithm is obviously ergodic since any 
configuration can be reached due to the random proposal of fa(y). Furthermore it satisfies detailed 
balance (|3.5|l by construction. This form is the special case of the general algorithm, where Vc(-, ■) = 1 
and thus V = Va alone. 

The algorithm discussed above is applicable to almost any system with multiquadratic action, but it may 
not be efficient. It may happen that those values of fa(y) which have a high chance of being accepted 
are strongly peaked around a small subinterval and consequently most suggestions are rejected. In such 
cases it is therefore preferable to take fa{y) from a non-uniform distribution which is very close (or 
even identical) to the desired distribution. In this case, the Metropolis decision will have to be modified 
accordingly. In the latter case, if fa(y) has already been taken from the correct distribution, the test can 
even be skipped (since this situation would correspond to the case Va = 1 and consequently V = Vc- 
This is the idea of the heatbath algorithm which is discussed in Sec. 13.4.21 
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Global Metropolis Update 

In contrast to the algorithm above, it is also possible to postpone the Metropolis decision until all lattice 
sites have been processed. This is the idea of the global Metropolis update. This may be necessary in 
a situation where the action cannot be written in the form of a local staple, or if this step is too costly. 
In general, the Metropolis decision will take the following form 



However, when choosing </>' to be a random configuration, the action S[<p'] will usually be widely different 
from S [</>], and thus the exponential will become huge. To be specific, the probability of acceptance 
is given by the f2th power of the single-site acceptance rate, where f2 is the lattice volume. For any 
reasonable lattice size, this number will consequently be prohibitively small, if even the single-site 
acceptance was of the order of 0(99%). 

Therefore a global Metropolis step can only be applied in the following situations: 

• The distribution of 4>' is close to the desired one. Hence, the sampling process was able to generate 
almost the "correct" distribution and one merely has to correct a small residual error. 

• The proposal </>' is very close to the old configuration <p. In such a case one has to make sure 
that ergodicity still holds and even if it does, the danger of running into metastabilities may be 
larger. Furthermore the autocorrelation times may not be very favorable in this situation since 
the evolution in phase space is rather slow. For the effort of processing all sites a much smaller 
path has been traversed than in the case of the local algorithms; this explains why e.g. the HMC 
algorithm is not competitive to local algorithms when the local form of the action is available (see 
below). 

The global form of the Metropolis algorithms therefore usually appears in combination with some 
other algorithm (cither of global or local nature) which generates a suitable proposal ft such that the 
acceptance rate, Eq. I|3.40fl . stays reasonably large. 

3.4.2 Heatbath Algorithm 

As has already been pointed out in the previous section, the heatbath algorithm generates a sample 
from a distribution which is identical to the equilibrium distribution /5 Eq [</>] . The name of the algorithm 
expresses the procedure of bringing the system in contact with an infinite heatbath. If there exists 
a global heatbath algorithm, then it will immediately generate the new configuration independent of 
the old one, thereby eliminating all autocorrelations. This fortunate situation is only seldom given, 
however. In many situations, it is possible to apply the heatbath at least locally, i.e. to generate a 
candidate <j)'(y) at a site y independent from the old value 4>(y) such that 4>'(y) is distributed according 
to 



Since repeated application of the local Metropolis update prescription generates a Markov chain for <j> 
at lattice site y, which also satisfies detailed balance, it will have a fixed point distribution which is 
precisely given by Eq. (j3.41J) . Thence, repeating the local Metropolis an infinite number of times on a 
single site is identical to the local heatbath algorithm. 

Finding the distribution (|3.41() is possible once its integral is known, i.e. [21] 



VA = mm(l,exp[S[<l>']-S[<t>]]) . 



(3.40) 



p[0'(y)]cxexp -AStfty)] 



(3.41) 



exp -AS[4>'(y)} d4>'(y) = dE A§ (4,'(y)). 



(3.42) 




(3.43) 
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Often, it is not possible to directly generate the desired distribution H3.43J) . but rather only an approx- 
imation. Call this approximation po[4>{y)] with its integral Now generate the new variable 4>'{y) 
and correct for the difference to the desired distribution, p[<f>(y)\, with a Metropolis step with probability 

121 

v = pwm min MMM 

PoW{y)\ a<4>{y)<b p[4>{y)\ 

The total transition probability matrix for this process is given by 

V = {R)p[<t>(y)} + (l-(R))l, 

with (R) being the average acceptance rate which depends on the quality of the approximation of 
po(4>(y)). Iterating this step for M times yields the transition probability matrix 

v {M) = (i - (i - (R)) M )p[Hy)} + (i - • 

In the limit M — > oo the desired distribution is recovered. However, it is sufficient to just iterate 
this step M times (where the optimal value of M should be determined such that the algorithm has 
the highest efficiency) since the stationary distributions of V^ M ^ and 'p*- 00 -' coincide by virtue of the 
properties of the Markov process. 

One can also choose to iterate the transition as long as the proposed change is accepted, i.e. stop 
the iteration once one proposal has been rejected. This procedure will also have the same stationary 
distribution. Again, considerations of numerical efficiency should decide which choice is optimal. 
In the following, several implementations of local heatbath algorithms which are needed for the multi- 
boson algorithm are presented. 



Heatbath for Gauge Fields 

First consider the case of the Wilson action, Eq. (|C.6() from App.lCl for an SU(2) gauge theory |Sl ll56| . 
The distribution to be generated for a single gauge variable U = U^{y) then takes the form 



dE A g(U) oc exp 



RcTrC/S 



dU. 



(3.44) 



The link variable U E SU(2) can be parameterized as 

3 

U = cio + i y ' oya r . 

r=l 

The unitarity condition implies 

C/t • U = 4 + a r = « 2 = 1. 00 = 2(1 - |a| 2 ) V2 ; 



where z = ±1, and |a| = | J2l=i a ^\ 1 ^ 2 - The Haar measure dU in Eq. (|3.44j) can be parameterized as 
1 



dU 



2tt 



: 5(a 2 - l)d 4 a . 



In the present form, all parameters depend in a non-linear way on the distribution and the precise form 
of the staple §. Thus, it appears that generating the desired distribution is a tough problem. However, 
it is possible to exploit the invariance of the Haar measure dU on the gauge group and to rotate the 
l.h.s. of H3.44fl to a distribution which only depends on dctS. This step significantly simplifies the 
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problem; consider the SU(2)-projection U = S/v detS = S/fc (cf. Eq. (|B.5j) in Add. IB.3.1jl . Obviously, 
U G SU(2) holds, so the Haar measure stays invariant under right multiplication with U , 



dE^iU-lf- 1 ) cx exp 



-kReTrUSU" 1 
2 



dU 



exp 



-fcRcTr<7 
2 

1 



dU 



exp \f3kao] Tr^r^io 2 — l)d 4 a ■ 
2tt z 



(3.45) 



In this form, the distribution only depends on the determinant of the staple, fc, and the point clq has 
a non-trivial distribution alone. Once it has been chosen, the remaining components, a', are a random 
point on the unit sphere in three-dimensional space, S 3 , and can be chosen, for instance according to 
d 2 fi a = dcj)d(cos6). The distribution for qq is given by (with ag G [—1, 1]): 



p[a ) cx yd - a 2 , cxp(/3fca ) . 
By applying the transformation y = exp(/3fcao) one obtains 

2\ V2 



p[a ] cx 1 - 



logy 

0* 



(3.46) 



(3.47) 



This distribution can be generated by the method from Eq. I|3.43ll by choosing the proposal for a' 
from the interval [exp(— (3ti), exp(/3fc)]. An alternative method has been introduced by Kennedy and 
Pendleton in |157j . This method is superior if the distribution for ao is peaked close to one, a situation 
which is typically encountered in multiboson algorithms. While the method from Eq. (j3.47|l becomes 
less efficient for sharply peaked distributions, the latter choice will soon become superior. 
Once the new {a' , a'} have been obtained in this way, the new link proposal can be obtained by applying 
the inverse rotation in Eq. Q3.45JI thus yielding 



U' 



a' l 



i <7 r a.' r 



U. 



(3.48) 



An extension of this procedure to the case of SU(iV) gauge theories with N > 2 is more difficult since 
they do not share the property that any sum of group elements is proportional to a group element. A 
possible generalization has been proposed in |158j . The basic idea is to decompose the whole SU(A^) 
group into an appropriate set of SU(2) subgroups such that no subgroup is left invariant. Call this set 
{dfe}, k = 1, . . . , q. A possible choice is q = N — 1 with 



/ 1 



\ 



1 



a k G SU(2) 



V i / 

The new field variable U' is finally chosen to be 
U = a q ■ a q -\ ■ . . . ■ a\ ■ U . 
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Defining 

Uto = a k 'a k -i-...-aiU, C/ (0) =C/, 

one obtains the recursion 

[/W =flfc .[/( fc -i), U M =U'. 

Now each multiplication with a k gives rise to a heatbath distribution of the SU(2) group, Eq. <|3.44[l . 
Hence, one has to take 

|:ReTr (a k ■ U^S + ...) = ^ReTl (a kPk ) + . . . , (3.49) 

where p k now takes over the role of the SU(2)-staple in Eq. 1)3.44(1 . For the proof that this procedure 
does indeed generate the desired distribution consult [2U 1158) . 

Heatbath for Scalar Fields 

In the case of scalar fields one encounters actions of the type (|C.5J| . One prominent example is the 
evaluation of the fermion matrix in sampling algorithms (see Sec. l3.5.T|) . Another case of major impor- 
tance is the evaluation of correlation functions like Eq. (|3.37() . These systems allow for a rather simple 
implementation of both local and global heatbath algorithms. In fact, this is one of the few cases, where 
a global heatbath algorithm exists. The application of the local algorithm is straightforward: For each 
site x G Zq generate a Gaussian random number r\ with width 1, i.e. 

p[rj\ cx cxp(-|?/| 2 ) . 

The new field variable, (f>'(x) is then given by 

M 

2 



4>'{x) = ~a- l 1 L- \Y.a l [4>(ff(x))--^Ur{x))] ) . (3.50) 



There also exists a more powerful variant which is applicable if the total action admits the following 
form (as it is the case for fermionic actions): 

s = <P f (y)QHy, z)Q(z, ^{x) . (3.51) 

xyz 

Similarly to the local case, the generation of the <j>{x) proceeds by taking a random Gaussian vector 
rj(x) with unit width, i.e. 



p[rj(x)] oc 



cxp ( - ^^(xMx) 

\ X / 



Then solve the equation 

J2Q(y,x)<f>(x) = v(y)- (3.52) 

X 

Thus, the global heatbath requires a matrix inversion for each new sample 4>(x). This is rather costly 
compared to the local variant; however, the advantage is that there is no autocorrelation at all for the 
whole sample of {(f>(x)} generated. 

Several methods how to perform the matrix inversion are discussed in detail in Sec. 13. HI All these 
methods provide an approximation with a residual error e. The question arises, how small this error 
should be made. Choosing the residual error too large will result in a bias introducing systematic errors 
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beyond control. One could make the residual error extremely small, i.e. several orders of magnitude 
below the statistical error inherent in the Monte Carlo integration. But this will waste computer 
time in generating an inverse with too large accuracy. This question has been addressed in several 
publications, see |159l 11601 1161| and references therein. An improvement to these standard methods 
has been suggested in 162 , which allows for a reduction of the computer time required by a factor of 
about 2 — 3 while still generating the correct distribution. The idea is again to sample an approximate 
distribution and apply a Metropolis correction step. Consider a vector distributed according to p[x(x)] oc 
exp(—\x(x) — J2 V Q( x ^y)x{y)\ 2 )- Then consider the joint distribution 



p[<p(x),T](x)} tx exp 
By virtue of 



:{y) - ^2Q(y,x)cj)(x) 



l 

Z,:, 



| exp 



^2,Q{y,x)r){x) 



[#] [dx] exp 



^2 Q(y, x)t]{x) 



x(y) - ^2Q(y,x)(t>(x) 



the distribution of p[4>] is unchanged. Now one can update x( x ) and (j)(x) with the following alternate 
prescription: 

1. Perform a global heatbath on x( x )i 



X 



(x) = r](x) + Q( x , y)<i>(y) , 



where rj(x) is a random Gaussian vector with unit width. 
2. Perform the reflection 

4>'{x) = Y J Q- 1 (?,y)x{x)-4>(x), 



(3.53) 



which yields the new vector </>'(x). 

The second step conserves the probability distribution of 4> but is not ergodic. The first step ensures 
ergodicity. The matrix inversion in (|3.53() now can be performed with a finite accuracy e yielding the 
approximate solution 

X] Q -1 ^ y)C(y) = x(x) - r(x) , 

y 

where r(x) is the residual. Now the second step can be considered as a proposal for <p'{x) = C( x ) — 4>( x )- 
It will be accepted in a Metropolis step with probability (cf. Eq. H3.38J) ) 

P acc O(x) -> ((>'(x)) = min(l,cxp(-AS')) , (3.54) 

where 



AS* = 



Y,Q( x >yW(y) 



^2Q{x,y)(j)(y) 



c(x)-J2Q( x ,yW(y) 



i x ) -~J2Q( x ,y)<f>(y) 



2Re ]T rt (a) ]T (Q(x, y)0(y) ~ Q(x, y)<f>'(y)) . 



(3.55) 
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If the matrix inversion is solved exactly, i.e. \r(x)\ = 0, then one will recover the original global heatbath 
algorithm. It has been discussed in |162| that there exists an optimal choice of e ~ 10 -3 — 1CP 4 which 
reduces the computer time by a factor of 2 — 3 over the older methods. 

3.4.3 Overrelaxation 

A particular method to improve the behavior of the system near criticality consists of overrelaxation. 
It is similar to the technique of overrelaxation in differential equation algorithms [1631 1164| . The idea 
can also be generalized to gauge theories ]156l 1165] . An overrelaxation step performs a reflection in the 
space of field elements, which keeps the action invariant. When applying the local Metropolis decision, 
Eq. I]3.39]l . the change is thus always accepted. Since the action does not change, the algorithm is 
non-ergodic and generates the microcanonical ensemble, Eq. l]2.45]l : it does, however, satisfy detailed 
balance, Eq. i]3.5]) . Consequently, it cannot be used as the only updating scheme, but it can increase 
the motion of the system in phase space if mixed with an ergodic algorithm. In this way, the expected 
improvement may result in a dynamical critical scaling exponent of about z ~ 1, cf. [24] . 
For a multiquadratic action of the form (]C.5]I , a local overrelaxation step may simply be implemented 
by choosing the new field <j)'{y) to be 

M 

4>'(y) = -m ^r 1 E °* (/'(»)) ■ • > (/"' (f ))] ■ ( 3 - 56 ) 

i=2 

For the Wilson action of the SU(2) gauge theory, the overrelaxation step can be performed by choosing 
the new element U'(y) as 

%{y) = ll{y)Ul{y)l- l {y) , (3.57) 

where §^ 1 (y) is given by S^" 1 (y) = §t(j/)/ det§ A1 (y). This replacement leaves the action invariant since 
(note that no summation over the index /x must take place!): 

ReTr (c^(y)S M (»)) = ReTr (c7 M (y)§ M (j/)) . 
Eq. (]3.57]l is equivalent to the following transformation: 

%{y) = UoU-'Uo, U = S-\y)^/d^M = S+ (y)/yfdct I~(y) • (3-58) 

It is possible to generalize (]3.58]l to the case of SU(AT), N > 2, with the same Cabibbo-Marinari 
decomposition as discussed in Sec. 13.4.21 

3.5 Fermionic Sampling Algorithms 

The algorithms discussed in the previous sections have for a long time only been applicable to the case of 
theories without dynamical fcrmions, i.e. the quenched approximation. The typical cost one has to pay if 
one includes dynamical fermion contributions is a factor of about 100— 1000. It was not before the mid- 
90's when sufficient computer power became available to treat also dynamical fcrmions numerically. One 
further problem is that in a Yang-Mills theory including dynamical fermion contributions, Eq. I]2.89]l . 
the fermion determinant is a non-local object. Therefore global algorithms like the HMC had to be 
employed. A possible way to rewrite (]2.89]1 to obtain a purely local action has been put forward by 
LtisCHER [166] , This was the key to also use local sampling algorithms for systems with dynamical 
fcrmions. 
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3.5.1 Sampling with the Wilson Matrix 

The essential problem of lattice fermions is the evaluation of the determinant from Eq. (|2.79|l . This can 
be achieved by using a Gaussian integral over boson fields $t(x), $(x) [TfTfj 



where the field &(x) has the same indices as the Grassmann field ^(x). The prefactor from the inte- 
gration in Eq. 1)3.59)1 is a constant which cancels in any observable and will hence be dropped from now 
on. The determinant can be evaluated using a stochastic sampling process similar to the measurement 
of observables using 1)3.1)1 for the evaluation of 1)2.40)1 . Thus, the fermionic contributions can also be 
written as a part of a measure. The prefactor is a constant and will cancel for any observable. Therefore, 
it will be disregarded in the following. 

However, there are some problems with the application of (|3.59|) to the Wilson matrix, Eq. 1)2.81)1 . The 
former is only defined for a Hcrmitian and positive-definite matrix, a condition clearly not fulfilled by 
the Wilson matrix. Nonetheless, the product ■ Q is Hcrmitian and positive definite, so 1)3.59)1 is 
applicable. This expression corresponds to two dynamical, degenerate fermionic flavors. 
A second problem regards the fact that the Wilson matrix will have eigenvalues close to zero when 
describing sufficiently light fermions (cf. Sec. 12.6.1)1 . Since 1)3.59)1 computes the inverse determinant, a 
single noisy estimate may oscillate over several magnitudes and in sign, see e.g. |149j . Therefore, one 
instead tries to compute the determinant instead of its inverse. 
The above arguments result in the expression 



When approximating the determinant in 1)3.60)1 with a finite sample of configurations, {<j>;(a;)}, one can 
use the global heatbath applied to the scalar boson fields $(x) as discussed in Sec. 13.4.21 This requires 
a matrix inversion. Algorithms to perform this inversion will be discussed in Sec. 13.01 

3.5.2 Hybrid Monte-Carlo Algorithm 

The idea behind the molecular dynamics-based algorithms is different from those discussed in the previ- 
ous sections. The key feature consists of using quantities obtained from averages of the microcanonical 
ensemble 1)2.45)1 as an approximation to the average as given in Eq. H2.40|) obtained from the canonical 
ensemble. This identification works in the thermodynamic limit, i.e. in the case of large lattices. The 
first time such an algorithm was used in the context of pure gauge field theory was in |168) . This 
class of algorithms turned out to be applicable to the case of dynamical fermions and became the stan- 
dard method for this type of systems. Closely related to this line of thinking is the idea of stochastic 
quantization )169| . 

In order to simulate the pure gauge action 1)2.67)1 using some classical Hamiltonian formalism, consider 
the partition function 1)2.39)1 for the random ensemble 1)2. 46)1 . applied to quenched action, 



Inserting a unit Gaussian integration with a field P^x) carrying the same indices as U^ix) into the 
partition function introduces an overall constant which does not change observables, 




(3.59) 




(3.60) 



Z = 





(3.61) 
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where H [U, P] is given by 



(3.62) 



X 



The phase space has been enlarged by the introduction of the new fields. Now the idea of the molecular 
dynamics methods is to simulate a classical system, interpreting the function H[U,P] in Eq. H3.62|) as 
the corresponding Hamiltonian and thus the new fields Py.(x) as the canonical conjugate momenta of 
U^x). This method, however, will only simulate the microcanonical ensemble with the fixed "energy" 
H[U, P]. Since the microcanonical ensemble can be used as an approximation to the canonical ensemble 
as one approaches the thermodynamic limit, one can take the samples from sufficiently long classical 
trajectories for very large lattices to compute observables. 

Since the canonical momenta appearing in (|3.62|) have a Gaussian distribution independent of the fields, 
one can extend the algorithm by not only considering a single classical trajectory, but several of them, 
all starting with Gaussian distributed initial momenta. This would clearly solve the problem of lacking 
ergodicity of the purely microcanonical approach. If the momenta are refreshed regularly during the 
molecular dynamics evolution, one arrives at the Langevin algorithms |170| . The extreme case is to 
refresh the momenta at each step which would imply that one performs a random walk in phase space. 
The other extreme is the purely molecular dynamics evolution which moves fastest without ever changing 
direction by refreshing the momenta, but lacking ergodicity. A combination of both approaches are the 
hybrid classical Langevin algorithms |171| . where at each step a random decision takes place whether 
to reshuffle the momenta or not. 

The culmination point of the molecular dynamics algorithms is the hybrid Monte-Carlo algorithm (see 
for the foundations |172l 1173] . for a more detailed discussion |174j and for recent reviews |175l I133] h 
The idea is again to simulate the classical equations of motion along a trajectory of a certain length; 
this is easily achieved by integrating the canonical equations of motion, Eq. I|2.5[) . 



The integration of the equations of motion can be done with various algorithms available for molecular 
dynamics. Of particular interest are the symplcctic integration schemes, see e.g. [1761 I177j for an 
introduction. A scheme which is of second order and which requires only a single force evaluation per 
step is the leap-frog integration scheme. The integration of the equations of motion proceeds with a 
finite step length, At. The leap-frog method has a systematic error or the order of 0(At 2 ), so the 
actual trajectory in the simulation may differ from the exact solution of 1)3. 63J) . This deviation can be 
corrected for by a global Metropolis step similar to Eq. 13.4Ufl . but with the action S[U] replaced by 
the "Hamiltonian" H\U, P]. The integration of (|3.63() is done for a certain number of steps, n MD , which 
is thus the length of an HMC trajectory. After the Metropolis decision has taken place, a new set of 
Gaussian random "momenta" is shuffled and the whole integration is started again. 
A crucial point for the application of the molecular dynamics evolution is the reversibility of the tra- 
jectory, i.e. replacing At by —At should return to the system to exactly the same point in parameter 
space, where it has started. This condition is necessary for detailed balance, Eq. (|3.5() to hold. 
The generalization to fcrmionic field theory was suggested in |178| . It proceeds by considering the 
partition function 



dH[U,P] 
dP» ' 
dH[U, P] 



(3.63) 




(3.64) 



where the "Hamiltonian" H[U, (jy , </>, P, tt\ tt] is now given by 




X 
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+s g [U] + J2^ x HQ t Q) 1 (.x>v)<Kv)) ■ ( 3 - 65 ) 

v ' 

The explicit form of the resulting equations of motion can be found e.g. in |137U133j . 
By adjusting the step length, At, and the trajectory length, n MD between two Metropolis decisions, 
one can tune the acceptance rate and optimize the algorithm to achieve best performance. In general, 
the larger the total trajectory length, At ■ n MD , the lower the acceptance rate, since a longer trajectory 
introduces larger numerical errors. This can be compensated by making At smaller (and consequently 
n MD larger), but this will increase the required computer time per trajectory by the same factor. 
The suggestion by Creutz |179l 118(5) was to choose At ■ n MD ~ 0(1) and modify the two parameters 
such that the acceptance rate is about P acc > 70%. This proposal has been tested numerically in 181 . 
A different investigation has been performed in the case of compact QED by Arnold in |151| . Of 
particular interest is also the impact of 32-bit precision on the feasibility of the algorithm. As has been 
shown in [1821 1183) . the systematic error (i.e. the non-reversibility of the HMC trajectory) in case of 
QCD with two dynamical fermion flavors is of the order of 2% on a il = 40 x 24 3 lattice. This error 
should in any case be small compared to the statistical error of the quantities under consideration. 
In conclusion, the advantages of the HMC are that it has only two parameters namely At and ?i MD , 
that its optimization and tuning is well understood and under control, and that it is rather simple 
to implement even for more complicated systems. In direct comparison to the local algorithms, it is, 
however, less efficient. This is related to the fact that a local sweep changes each variable by a greater 
amount than a global sweep, while it may still have a similar computational cost. In the quenched case of 
QCD it soon became clear that algorithms like the HMC are not competitive with heatbath algorithms, 
in particular if they are used together with overrelaxation techniques, see for a recent algorithmic review 
e.g. QUI. 

There exist extensions of molecular dynamics-based algorithms which allow to handle also odd numbers 
of dynamical quark flavors. One method is the i?-algorithm, which has a residual systematic error 
which has to be kept smaller than the statistical error of observables |185[ . A method which is free of 
systematic errors has been proposed by Lippert in |186| . but its efficiency may be rather limited due 
to the presence of a nested iteration 1 . A different approach has been suggested in |187| and exploited 
in Q2HI lIEni- 
One potential problem of the HMC scheme is related to the question of ergodicity. Although the 
algorithm is exact and ergodic in the asymptotic limit, for finite time series it may get "stuck" in 
certain topological sectors. In particular, in a study of dense adjoint matter, it has been shown in 
|190| that the HMC method is not ergodic, while the MB algorithm retains ergodicity. This question 
has also been raised by Frezzotti and Jansen |191| who introduced a variant of the HMC algorithm 
|192l 1193) . using a static polynomial inversion similar to those discussed in Sec. 13.6.11 fsee also |187| ). 
For a recent comparison of efficiencies of current algorithms see |194j . 

3.5.3 Multiboson Algorithms 

As discussed in the previous subsection, the standard HMC allows to simulate the situation with an 
even number of degenerate, dynamical fermion flavors, at the expense of having a global algorithm. 
Furthermore, since ergodicity is only ensured in an asymptotical sense, one may ask whether it is 
possible to use a different approach for the same problem. As has been shown by Luscher |166| . it is 
possible to rewrite the action (|2.89(l in such a way that a purely local action is obtained which can be 
treated by more efficient algorithms like local heatbath and overrelaxation. The algorithms based on 
this idea are called multiboson algorithms (MB). For an overview of recent investigations consult (195) . 
For theoretical estimates of efficiency especially compared to the HMC consider |196j . 

It is possible, that the quadratically optimized polynomials discussed in Sec. are able to handle this iteration in 

an efficient way 
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Consider a similarity transformation Eq. 1(2. 84j) . but applied to the non-Hermitian Wilson matrix 
Q(y,x). The resulting diagonal matrix, Q(y,x) will have all eigenvalues of Q(y,x), Xi, on its diag- 
onal. Then consider a polynomial of order n, 

n 

P n (x) = CnY\(x - Zj) , (3.66) 

7=1 

which approximates the function l/x over the whole spectrum of Q 2 (y,x) with a certain accuracy, e. 
Applying this polynomial to the matrix Q 2 (y,x) will yield an approximation to Q~ 2 (y,x) as can be 
seen by applying (|3.66() to the diagonal matrix from Eq. (|2.84() , since the resulting matrix will have the 
inverse eigenvalues, 1/Ai, on its diagonal. This allows the fermionic action to be rewritten: 

s > = EE z ) - pi) x ) - pi) > ( 3 - 67 ) 

j xyz 

where the pj are the roots of the Zj . The determinant is then computed via 

-^(Q - p*){Q - . 

This action has the form of Eq. 1|C.1J) and thus can be treated by local heatbath and overrelaxation 
techniques, as they are discussed in Sec. 13.41 The system now incorporates the gauge fields {U^x)} as 
before, but in addition also AN x n scalar fields {(j>j(x)} since the polynomial has n roots and each field 
has the same indices as a Dirac spinor times the Yang-Mills group number N. In the following these 
fields will be referred to as "boson fields" . Hence, it is apparent that the system of (|3.67(l has both a 
huge memory consumption and may have a relatively complicated phase space. In any case, one will 
have to deal with n additional fields and the computational effort will still be enormous. 
The central question now regards the optimal choice of the polynomial. Clearly, its order n should be 
kept as small as possible while still maintaining a sufficiently good approximation. In any case, the 
polynomial approximation in l(3.66|) is a static inversion (cf. Sec. 13.6(1 . This means that once the choice 
has been fixed, one cannot alter the polynomial during the sampling process anymore. For an overview 
of the choices available, see Sec. 13.6.11 



det ° 2 "^7^ = 7:/[ d ^^]exp 



Even-Odd Preconditioning for MB Algorithms 

It is possible to incorporate the preconditioning technique introduced in Sec. 12.6.41 to the multiboson 
approximation (|3.67|) . However, since the matrix in l|2.88|l contains next-to- nearest neighbor interac- 
tions, the square in l|3.67|l would introduce an even more complicated action which may have up to 
fourth- neighbor terms and thence would be almost impossible to implement: 

dctQ 2 w (dct p(Q 2 )) 1 

= II ( dct (3 - Pi) - Pi)) ' ■ ( 3 - 68 ) 

3 

This problem has been solved in |197) by applying the Schur decomposition from Eq. H2.87[) again to 
the preconditioned action: 

det (q - Pj ) o< det ( ) = det (q - P oPj ) , 

where P denotes the projector on "odd" sites (P = diag (0, . . . , 0, 1, . . . , 1), which contains on the 
first half diagonal and 1 on the second half). The resulting preconditioned action is then given by 

5 » = E E z ) ~ p °p^) x ) - p °pi) ■ ( 3 - 69 ) 

j xyz 

This is the action which will be considered from here on. 
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Exact Multiboson Algorithms 

The multiboson algorithm as discussed so far only uses an approximate polynomial with a residual 
error e. One could decide to stay with this error and try to minimize it by increasing the order of the 
polynomial n. But this would indeed be a bad idea since the computer time and memory requirement 
would become enormous. Thus, different proposals have been made to get rid of the residual error. 
The original proposal |166j was to generate a sample of configurations using the action (|3.67|) as an 
approximation to the "real" action in the sense of (|2.49|) . Then one performs a reweighting of the 
observables using (|2.50() . This procedure is free of systematic errors but it may introduce additional 
noise in the measurement of observables if the initial approximation of Ij3.66(l is bad. Therefore, this 
approach has been abandoned in practical simulations. 

The method which is used in current simulations is to apply a Metropolis step l|3.40|> after a set of 
local sweeps |198l 11991 I2UUI I2UT] . In this way the algorithm is free of any systematic error provided the 
correction factor is computed with sufficient accuracy. The exact acceptance probability is given by 

P„=^(l™&W P »&™) , (3.70) 
^ det (Q*lU)P n (Q 2 [U])) J 

with U' being the gauge field configuration after the local update sweeps and U being the gauge field 
configuration prior to the sweeps. 

Still the problem remains to actually compute the ratio of the determinants in l|3.70[l . The straightfor- 
ward evaluation with a noisy estimate vector r\ using a global heatbath as discussed in Sec. I3.4."2l will 
result in a nested iteration of an inversion algorithm and the polynomial P n {Q 2 )- In this sense, the 
polynomial will act as a preconditioner. 

Another approach has been suggested in |198| : One can obtain an estimate to the determinants by 
computing the low-lying eigenvalues for which the chosen polynomial was only a bad approximation. 
This allows to compute the correction factor directly. For the smallest L' eigenvalues {A^}, i = 1, . . . , L', 
this yields 

L' 

dot (Q 2 P n (Q 2 )) ^l[X l P n (X l ). (3.71) 

i=l 

This approximation is reasonable if the approximation P n {x) is inaccurate only for small x. Nonetheless 
there is no way to limit the systematic error if one doesn't want to determine L' dynamically. Fur- 
thermore, this approach can be expected to scale badly with the volume since the eigenvalue density is 
proportional to volume and the total effort will at best scale as ft 2 . 
For a discussion of the effect of the polynomial quality on the acceptance factor, see |201) . 
Another suggestion lies at at the basis of the Two-Step Multiboson (TSMB) algorithm proposed by 
Montvay in |2()2j . This is discussed below. 



Non-Hermitian Variant 

One can also use the non-Hermitian Wilson matrix, Q(y,x), instead of Q(jj,x) for the construction of 
the polynomial approximation. In this case, the action l|3.67f) takes on the following form: 

5 ' = E E (G f (i/. z ) - Pi) *) - p^ & ■ ( 3 - 72 ) 

This suggestion has first been put forward by Borkji and DE FORCRAND in |203| . It is directly applicable 
to the case of an even number of mass-degenerate fermion flavors, just like the HMO However, the 
approximation l|3.66J) fails once a real eigenvalue gets negative. This problem is avoided as long as the 
fermion masses are still large. It is unclear, however, what will happen if the masses get small enough, 
so that fluctuations may eventually cause the smallest real eigenvalue to cross the imaginary axis. 
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Since the effort of inverting the non-Hermitian matrix is lower than in the Hermitian case, the algorithm 
is in principle more efficient, whenever the aforementioned problem is avoided. 

It is important to realize that also an algorithm based on the expansion (|3.72|1 will be "exact" even if 
a real eigenvalue gets negative, whenever it uses a correction step as discussed above. The correction 
step will correct any errors in the polynomial approximation. However, the algorithm may become 
inefficient since the acceptance rate would drop almost to zero once a point in phase space is reached 
where the approximation becomes invalid. 



TSMB Variant 

An extension of multiboson algorithms which allows to handle situations with an arbitrary number of 
fermion flavors has been suggested by Montvay in |2U2| . In particular, supersymmetric Yang-Mills 
theory on the lattice has been examined (see for early reviews 

HH EH)- For the physical results 

consult i^iwii^ii^immmn . 

This approach can immediately be generalized to the case of an arbitrary number of dynamical fcrmions, 
in particular the physically interesting case (cf. Sec. 12.5.3*)) with three dynamical quark flavors [212| : 
this is done by choosing a polynomial P ni (x) (the reason for calling the polynomial order n\ instead of 
n will become clear soon) which approximates x~ a , where a ^ 1 is allowed. For a > 1 one generally 
requires larger order n to achieve the same accuracy while for a < 1 one gets along with smaller n. The 
value of a determines the number of dynamical fermion flavors via a = Nf/2 since the polynomial is 
still applied to Q 2 . Thence, for gluinos one has to choose a = 1/4 leading to Nf = 1/2 |2()2| . The case 
of three dynamical fermion flavors, as discussed in Chapter El requires the choice a = 3/2. 
The central idea regards the computation of the correction factor <|3.70[l . The generalized correction 
factor for a^l takes the form: 

^ = minfl, dCt ^^ P "^ 2 ^»V (3.73) 

The evaluation with a noisy estimate is highly difficult since now a (possibly non-integer) power of the 
matrix Q will have to be inverted. The idea of Ref. 202 was to employ the multicanonical sampling 
(cf. Sec. 12.3.4(1 to get an approximate action 

S[U] = S S [U] + In = ^- , (3.74) 

detP„ 1 (Q2) de tP n2 (Q2 ) 

where the polynomial P„ 2 (x) satisfies 

det Q 2a ~ 



detP ni (Q 2 )detP„ 2 (Q 2 ) 
This can be achieved by replacing the TSMB noisy correction step l(3.73|l by 

P m = m infl, dct ?^?™V (3.75) 
V 'dct P„ 2 (Q 2 [[/']) ) 

In order to compute this ratio using a noisy correction vector, one uses the sampling prescription 
as discussed in Sec. 13.5.11 This requires the application of a global heatbath, as aforementioned in 
Sec. 13.4.21 which is very expensive since it would again require a nested inversion for the polynomial 
P„ 2 (-). Therefore, the suggestion of |2U2j was to use a third polynomial P n3 (x) with order 713 which 
approximates the inverse square root of P„ 2 (x), 

(Pn 2 (x)Y 1/2 • (3.76) 
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When applied to a matrix, one obtains 

Pn 3 (Q[U) 2 ) « (PnMU] 2 ))" 1 ' 2 ■ (3.77) 

The reason for this procedure becomes clear, if one evaluates Ij3.75jl using a noisy estimate. In practice, 
a single noisy vector is usually sufficient |2()2| . Then the acceptance probability becomes 

P acc = min (l,exp [-77+ (p„ 3 (Q[U'] 2 )P n2 (Q[U} 2 )P n3 (Q[Uf) - l) 77] ) , (3.78) 

with rj(x) being a random Gaussian vector with unit width. 

The approximation of P n3 (-) in (|3.76J) determines the total residual error of the algorithm. There is no 
way to correct for this error after the sampling has taken place since the error appears in the correction 
step and cannot be rewritten as an extra term in the action. It is of vital importance to keep this 
influence small. A precise investigation of the effects associated with this residual error can be found 
in Sec. EXT! 

After 71,3 has been chosen sufficiently large, the total systematic error is governed by the second poly- 
nomial, P n2 (-). This systematic error, however, is present in the action (|3.74(l and can therefore be 
corrected by the measurement correction, Eq. I|2.5L)[) . As shown in |213| . this can be done by consider- 
ing yet a further polynomial, P„ 4 (x), defined by 

P ni (x)P n2 (x)P„ 4 (x) « x~ a . (3.79) 

The calculation of the expectation value of an operator (A) then proceeds by applying Eq. (|2.5U|I : 

(A) = Z- 1 f [d V ][dU] A[U]exp - P ni (Q[U} 2 ))v] , (3.80) 



with 

Z= l[dU\[<W cxp tf(l-P n4 (Q[U] 2 ))v 



The interval of the polynomial approximation for P n4 ( - ) must be sufficiently large to cover the entire 
eigenvalue spectrum of Q 2 [U] for all gauge fields in the sample. Since this may be problematic if 
exceptional configurations with extremely small eigenvalues arc present, one can combine the noisy 
estimation of the correction factor in (|3.8()(1 with an exact computation of the corresponding factor for 
the smallest eigenvalues (see [ZE|) : 

(A) = z- 1 J [dU] A[U] J] (X 3 ) a P ni (^)Pn 2 (^) , (3.81) 

3 

with Xj being the jth eigenvalue of the matrix Q[C/] 2 . 

This procedure can also act as a preconditioner to the computation of P n4 {Q 2 )- The accuracy of P n4 (0 
can be adjusted until the correction factor has converged. However, as will be shown in Sec. 14. 1.51 it is 
in general not necessary to compute both the smallest eigenvalues and the correction factor using the 
sampling in (j3.8()|l ■ Since the eigenvalue approximation of the quadratically optimized polynomials con- 
verges extremely fast, it is sufficient to approximate the correction factor using the smallest eigenvalues 
only. 



3.6 Matrix Inversion Algorithms 

For the computation of the polynomials and the global heatbath in the previous sections, an inversion of 
the fcrmion matrix is required. The problem is to find a solution vector <f>(x) which solves the equation 

^2Q{y,x)ct>(x)=r 1 (y), (3.82) 

X 
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for a given matrix Q(y,x) and a given vector T](x). The numerical effort of this problem depends 
cubically on the size of the matrix 05] and monotonically on the condition number (see Sec. 13. 7|) . If 
the inverse condition number is of the order of or smaller than the machine precision, the matrix is 
said to be "ill-conditioned" , because the algorithms will in general be unable to yield a stable solution, 
although the matrix entries may not pose any direct problem themselves. The aim of preconditioning 
techniques is thus to reduce the condition number of the matrix Q(y, x) without altering the solution. 
Often techniques like those discussed in Sec. 12.6.41 also go by the name "preconditioner" , although the 
even-odd preconditioned matrix is different from the original one. 

For the case under consideration in the thesis, inversion of the lattice Dirac matrices (|3.67|) or (|3.69() 
is required. These matrices typically have sizes of the order N = (12 • O) which for a lattice of size 
fl = 32 x 16 3 is N = 1572864. Storage of the complete matrix would thus require about 18 TBytes 
and is out of reach for current computer technology. Consequently, for the inversion of Q(y,x) only 
an iterative solver may be considered. These solvers do not require the whole matrix to be stored in 
memory, but rather require the presence of a matrix-vector multiplication. This step typically consumes 
most of the computer time of the algorithm. 

From the repeated application of the matrix- vector multiplication, an approximation <fi l (x) of order 
O (Q l ) to the solution vector 4>{x) is generated. Thus, these algorithms apply a polynomial P(-) of 
order I with the matrix Q(y, x) as its argument to the starting vector r){x) yielding the solution vector: 

<Kv) « ^(y) = J2 p (Q)(y^M^) 

X 

= X! ( po +piQ(y' x ) + pzQ 2 (.y> x ) + ■ ■ -+piQ l (y, x )) v(x) 

x 

= ^2(po + Q-(pi + Q-(p2 + Q-(... + Q-Pi)))){y,x)ri(x) i (3.83) 

X 

where the order of the polynomial is given by I (which is thus the number of iterations required). 
Sometimes the iteration prescription can be cast in the form 

^ + \y)=Y d S{y,x)4> k {x)+c{y), (3.84) 

X 

where the matrix S(y, x) and the vector c(y) are independent of the iteration number k. Such methods 
are called "stationary". The Jacobi method, the Gauss-Seidel method and the (S)SOR methods are 
examples of such cases (cf. [2141 135] 1. 

A measure of the quality of the approximation in equation (|3.83() is given by the norm of the residual 
vector 

lrt = 2AM, (3.85, 
}^ x rf{x)n{x) 

where r l (x) is defined to be 

r'foH £ (3.86) 

X 

which should converge to zero as / approaches infinity. In some cases the exact solution is already found 
after a finite number of steps. In most practical situations, however, the exact solution cannot be found 
due to the limited accuracy of the machines and one is interested only in finding the solution in as few 
steps I as possible up to a certain accuracy ||r || < e. 

The solver determines the coefficients {po, . . . ,pi} of the polynomial P(-) in Eq. (|3.83|l or, in some cases, 
the recurrence coefficients of a recurrence relation. The algorithms may be divided into two classes: 
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• The coefficients of the polynomial are fixed prior to the iteration and do not depend on the shape 
of the matrix Q(y,x). This does not allow to exploit any knowledge gained by the algorithm 
during the iteration process and it does not allow to compensate for any rounding errors. Rather, 
the rounding errors will usually add up causing the iteration to saturate at some point where 
further iterations do not increase the accuracy of the solution. This class of solvers is called 
non-adaptive and is of great importance for multiboson algorithms; generally they are important 
in those cases where an approximate inverse is required with a fixed series of coefficients. This is 
the case e.g. for reweighting purposes. 

• The coefficients are determined dynamically during the iteration itself. Thus, the solver may adapt 
to the specific form of the matrix Q(y,x). These algorithms are called adaptive solvers and are 
in general superior to the non-adaptive algorithms in terms of required matrix- vector operations. 
Furthermore they are able to compensate better for rounding errors so the accuracy which may 
be achieved is higher than for non-adaptive ones. The reaction on ill-conditioned matrices is 
consequently improved as well. These algorithms are the method of choice if the inverse up to a 
fixed accuracy is required. 

For a complete overview of iterative solvers consult [4511214) . The algorithms which have been employed 
in this thesis are discussed in the following sections; all of them arc efficiently parallclizable both 
on MIMD (Multiple Instruction, Multiple Data) and on SIMD (Single Instruction, Multiple Data) 
machines. For an explanation of the architectures see e.g. |215| . 

3.6.1 Static Polynomial Inversion 

The choice of the polynomial P n (x) in Eq. 1)3.66)1 is crucial for the applicability of multiboson algorithms. 
The construction of any polynomial requires one to know at least the condition number of the Wilson 
matrix. Usually more information is available regarding the spectrum, cf. Sec. 12.6.41 and also the 
spectral density plots in Sec. 14.11 The original proposal of Luscher |166| is to use an approximation 
build from Chebyshev polynomials |45|. This approximation does not take care of the peculiarities of 
the Wilson matrix and thus this choice is not the optimal one. It is, however, a safe method which is 
applicable to any fermion representation if only the condition number is known. 

Quadratically Optimized Polynomials 

The quadratically optimized polynomials have been introduced by Montvay 202 . For a thorough 
discussion and comparison to the Chebyshev polynomials see |216| and for further technical details 
|217l [218 . The basic idea is to find the polynomial P n (x) which approximates a function x~ a (with 
a = Nf/2, cf. Sec. 13. 5^ in a given interval [e, A] in such a way that the relative deviation norm A 
defined via 




(3.87) 



is minimized. If P n (x) is expanded in coefficient form, 



n 



the coefficients { 



c„} of the polynomial minimizing 1)3.87)1 are given by )216) 
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(3.88) 
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with 

V u = 



M V1V2 



(\-e){l + a + n-v) ' 

^l+2o!+2n— v\— 1/2 gl+2oi+2n— v\— 1/2 



(A - e)(l + 2a + 2n - z^i - z^ 2 ) 

A straightforward computation of the P n {x) in terms of the expansion coefficients l|3.88l) is not practical, 
however. The coefficients will soon become arbitrarily large and the computation of larger polynomial 
orders is not feasible anymore, since then typically orders of n > 100 are required. Fortunately, the 
polynomials can be computed in terms of a recurrence relation which is stable even for orders of n « 1000 
and beyond, at least if 64-bit precision is used. 

Take a set of polynomials {$ v } (e.g. Jacobi polynomials are possible choices |217| ) satisfying the or- 
thogonality relation 

dx w{x) 2 <&^{x)<$>„{x) = 5^q v . (3.89) 

The weight function w(x) = l/x~ a can be chosen. Then P n {x) can be expanded in terms of the 
with coefficients d„, 

n 

P n {x) =^d v $ v {x) . (3.90) 
The coefficients {d v } are given by 

d v = —, b v = [ dx w(x) 2 f(x)$Jx). (3.91) 

The polynomials can be constructed by the three-term recurrence relation (see |216l 1219] ') 

$ M+ i(x) = (x + ^)* M (x)+7„_ 1 $ M _i(s), (3.92) 



with 



^ = --: = (3-93) 



9m 9m 
The factors {p^} are given by 

"A 



P/j, = J dx w(x) & M (x) x . 

The advantages of the quadratically optimized polynomials are that they only require the knowledge of 
the eigenvalue interval [e, A] of the matrices whose inverse one is interested in. They provide a very good 
approximation which is worse at the lower end of the interval where the eigenvalue density is decreasing, 
cf. Sec. 14.11 Furthermore, the quadratically optimized polynomials give a very simple way to control 
the number of dynamical fermions to be simulated by a multiboson algorithm. This can directly be 
done by by adjusting the value of a. Of great value is also the fact that they are very stable even for 
large orders. Finally, they can efficiently be implemented on parallel computers since they only require 
matrix-vector-multiplications and vector-vector-additions. 

The disadvantage is that they may not take into account all information which is available about the 
matrix under consideration. In particular, the eigenvalue density is also decreasing on the upper end 
of the interval, although the quadratically optimized polynomials have good accuracy at this point. In 
this sense, one might hope to achieve better results by modifying the weight function w(x). This still 
leaves room for further improvement in the future. 
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Ultra-Violet Filtering 

An important preconditioning technique which has been introduced to the field of multiboson algorithms 
by DE Forcrand 220 is known as UV-filtering. It makes use of the identity 



e 

so that 



-Tr A 



det e A 



1, 



det(l — kD) = exp 



M 



det (1 — kD) exp 



M 

E 

0=0 



aj D> 



(3.94) 



The order M of the hopping parameter expansion can be adjusted to minimize the total effort. The 
effect of UV-filtering on the order of the polynomial approximation has been examined in 220] and 
shown to be superior to standard HMC in |2*2*T1 12*2*2] . It turns out that the order n can be reduced by 
a factor of about two. 

In order to find the polynomial P n (x) one applies an adaptive inverter (Ref. |201| uses the GMRES 
method for this purpose) to a thcrmalizcd gauge field configuration. The polynomial will then approx- 
imate 



P n {x) w (1- Kl?)- Q exp 



-E a ^ 



(3.95) 



However, for larger orders n, the iterations used to fix the coefficients of the polynomial become nu- 
merically unstable. This is the reason why one needs the recursion form of the quadratically optimized 
polynomials. The instability will thus limit the applicability of the expansion H3.94JI . 
Concluding, UV-filtering is a highly effective way to reduce the order of the polynomial and thus to 
improve the algorithm to a large extend. On the other hand, one needs a thermalized configuration 
(or even several of them) at the physical point one is interested in. In this respect, the method to 
fix the polynomial P n {x) discussed in |222j will only become optimal after a certain run-time once 
thcrmalization is achieved. 



3.6.2 Conjugate-Gradient Iteration 

The simplest adaptive iterative inverter is the Conjugate Gradient (CG) scheme, see e.g. |214) for a 
reference implementation. It is also the oldest and best-known method for this problem. It requires 
that the matrix Q(y, x) is Hcrmitian and positive definite. The idea is to minimize the function 

/ (<Kv)) = \J2 <t>Hv)Q(y, x)<K*) - E ^y)^(y) ■ ( 3 - 96 ) 

xy y 

This function is minimized when the gradient 

d y f(<j>(y)) = J2Q(y> x )<t>(x)-v(y) 

X 

vanishes which is simply equivalent to Eq. Ij3.82|) . The iteration prescription is to choose orthogonal 
search directions p k {y) and minimize the function l|3.96(l along this direction in any iteration step: 



<t> k {y) = 4> k - l {y) + oc k p k {y). 

Correspondingly, the residuals r k (y) are updated as 



(3.97) 



r k {y) 



„k-l 



{y)-a k Y,Q(y,x)p k 



(3.98) 
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The coefficients Uk are computed as to minimize the function 

J2(<l> k -4>y (y)Q(y,x) {4> k -<f) (*) 

xy 

at each iteration step. Note that the existence of this minimum requires Q{y, x) to be positively definite 
— this is the reason why the CG algorithm only works for positively definite matrices. The minimization 
is performed by choosing 

\\r k -i(y)\\ 



\p k {y)\\Q ' 

with ||a(y)||Q denoting the following norm of a vector a(y): 
\Wv)\\q = ^V(y)Q(y,x)a(x) . 

xy 

The search directions are iterated via 
Av) = r k (y)+P k ^ 1 p k ~ 1 (y), 

Pk = (3.99) 



This choice of (3k makes it possible that p k is orthogonal to all previous Ap m and that r k is orthogonal to 
all previous r m (m < k) (cf. |214| ). This is also the reason why the algorithm is called CG, since it gener- 
ates a series of orthogonal (or "conjugate") vectors. The iterate (f> k (x) is chosen from the fc-dimensional 
subspace spanned by these vectors which is known as the "Krylov" subspace Kk (Q(y, x), rj(y)) 

K k {Q[y, x),n(y)) = s P an | r°(y), £ Q(y, x)r°(x), Q k ~\y, x)r°(x) I . (3.100) 

^ X X ) 

It can be shown |144l 1223"] that for a Hcrmitian matrix Q(y,x) an orthogonal basis for the Krylov 
subspace can be constructed using only a three-term recurrence relation. Thus, such a recurrence is 
also sufficient for constructing the residuals. In the CG algorithm this relation is replaced by two 
two-term recurrences: one for the residuals r k (y) and one for the search direction p k (y). 
The starting points of the iterations are chosen to be 

= v(y), Mv) = v(y) - E Q(v> x ^ 0( - x ) ■ 

X 

Of course it is possible to choose a different vector as starting vector for <fi°(y), e.g. a good guess if 
possible or a random vector if all else fails. 

The convergence of CG depends on the distribution of eigenvalues. With being the spectral condition 
number, an upper bound for the effort can be given |144| : 

U\x) - 4>(x)\\ Q < 2^p-!ll</>° - Hx)\\ Q . 

Thus, the number of iterations to achieve a relative reduction of e in the error is at most proportional 
to v / ^2. In the case of wcll-scparatcd eigenvalues, however, often a better convergence can be observed. 
This can be explained by the fact that the CG tends to optimize the solution in the direction of 
extremal eigenvalues first, thereby reducing the effective condition number of the residual subspace. 
For a discussion cf. |224| . 
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This method can also be extended to the case of non-Hermitian matrices: if Eq. 1(3. 82|) is multiplied 
from the left by the conjugate matrix Q'(y,x), the resulting equation becomes 

Y,QHv^)QM<i>{x) = Y,R{y^)4>{x) = Y,Q\v^)v{x) ■ (3.101) 

In this form the iteration is done using the new matrix R(y,x) = ^ z Q' (y, z)Q(z, x). Thence, this 
method requires two matrix multiplications per iteration. But the situation is even worse: Since the 
new matrix R(y,x) has a condition number K2(R) = k\{Q) exponentially larger than Q(y,x), the 
number of iterations required is increased by a factor of hi. Consequently, the CG algorithm is much 
worse for these applications and should only be considered as a last resort if other methods fail. 



3.6.3 GMRES Algorithm 

In the case of a non-Hermitian matrix Q(y,x) an orthogonal basis of the Krylov space can no longer 
be constructed by a recurrence relation among the residues r k . Thus, the whole space has to be 
orthogonalized; this can be done using the Gram-Schmidt construction: 

Av) = Av), 



w k >° 



w 



k,i+l 



(y) = $^Q(i/,aO« fc (aO, 

x 

(y) = w k ' i (y)-{w k ' i (y),v i (y)), (< = {!,...,*}) 



v k +\y) = w k ' k (y)/\\w k > k (y)\\ . (3.102) 
From the orthogonal basis of the Krylov space 

Ki(Q(y,x),v°(y)) = span {v°(y), v l (y)} , 
the iterate <fi l (y) can be constructed via 



(y) = ^(y) + J2vkv k (y), (3.103) 



k 

where the coefficients minimize the residual norm 



/ (0(y)) = My) - E Q(v. x )^y)W ■ ( 3 - 104 ) 

X 

This method is known as the "Arnoldi method" j225j . Thus, the Generalized Minimal Residual (GM- 
RES) algorithm minimizes the function l|3. 104(1 instead of l(3.96|l in case of the CG iteration. 
The advantages of this method are that it can be used to minimize non-Hermitian functions and 
that the residual norms ||r fc (j/)|| can be determined without computing the iterates (j> k (y). The major 
disadvantage is its huge memory consumption if the iteration number I and the problem size iV are 
large. Although this method converges exactly in N steps, this point is out of reach in the cases of 
interest in this thesis. Hence, only iterates up to a certain order I ~ 0(100) can be formed. In case 
higher accuracy is required, the method should be restarted several times discarding the previous Krylov 
subspace. Furthermore, the convergence properties can be improved by replacing the Gram-Schmidt 
orthogonalization by the Householder method. Thus, greater computer time consumption can be traded 
for higher stability. 

This method has been proposed in |220j for the generation of the polynomial P n (x) to be used in 
multiboson algorithms, Eq. I|3.66|l . 
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3.6.4 Stabilized Bi-Conjugate Gradient Algorithm 

The Bi-Conjugate Gradient (Bi-CG) method is an extension to the CG algorithm which is also applicable 
to non-Hermitian matrices. Unlike the proposal in Eq. I|3.101|) it does not square the original matrix 
and thus does not worsen the condition number. Instead it requires the computation of the Hcrmitian 
conjugate matrix Q^{y,x) to a conjugate set of residual and direction vectors, doubling the memory 
requirements of the CG algorithm. 

The updating prescription for the residuals then becomes 
r k (y) = r fc - 1 ( 2 /)-a fc ^Q(j/,x)/(x), 

X 

f k (y) = r k ~ 1 (y)~a k Y,QHy^)p k (^), (3-105) 

X 

while for the search directions one gets 
P k (y) = p k - 1 (y)+(3 k - lP k -\y), 

P k (y) = P k - 1 {y) + p k -iP k - 1 {y). (3.106) 

Now the choices 

ak = {f k - 1 {y),r k - 1 {y)) = (r k (y), r k {y)) 

(p k (y),J2 x Q(y^ x )p k ( x )) ' {f k - 1 (y),r k - 1 ) 

enforce the bi-orthogonality relations 

{f k (y),r l (y)) = 0, 



This method allows inversion of non-Hermitian matrices but docs not show a stable convergence pattern 
in all cases. It may converge irregularly or even fail completely. Therefore several modifications have 
been proposed to make the convergence smoother (for an overview see |214j ). The method known as 
Stabilized Bi-Conjugate Gradient (Bi-CGStab) as introduced by van der Vorst in |226j does not 
require the Hcrmitian conjugate matrix to be used, but has an overall cost similar to the BiCG method 
just discussed. 



3.7 Eigenvalue Algorithms 

An important ingredient of the application of multi-boson algorithms as described in this thesis is the 
knowledge of how to tune the polynomials to the eigenvalue spectrum of the matrix. Thus, it is of 
great importance to have methods available to correctly compute at least the borders of the eigenvalue 
spectrum. Another possible application is the preconditioning of the matrix to make evaluation of 
obscrvables more simple. This approach has only been used in the measurement of the correction factor 
l|3.71(l in this thesis. But a different application also covers the measurement of other observables as 
discussed in Sec. 13.31 This approach has been examined in |139) . 

The matrix Q(y, x) is said to have an eigenvector 7^ with corresponding eigenvalue Xi iff 

J2Q(y,xM x ) = *My)- (3.107) 

X 

A necessary condition for H3.I()7|) is 

det\Q(y,x)-X5(y,x)\=0, (3.108) 
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which translates to a polynomial of degree N which has exactly iV complex roots. These roots need 
not be distinct. Furthermore, Eq. I|3.1U8|) implies that to every eigenvalue Aj there corresponds an 
eigenvector since the matrix Q(y,x) — \i5(y,x) is singular and thus has a kernel with dimension > 1 
|45j . One important property is that the eigenvalues may be shifted by some constant r by adding 
r&(y) to both sides of Eq. I|8.1()7j) . 
Some matrices fulfill the normality condition 2 

z z 

The eigenvectors of a matrix fulfilling Eq. 1(3.109(1 span the whole vector space C . Applying Gram- 
Schmidt orthogonalization to this set of eigenvalues yields an orthonormal basis and thus a matrix 
fulfilling the unitarity condition 

z z 

Although an arbitrary matrix has exactly N eigenvalues and consequently N eigenvectors, these eigen- 
vectors do not necessarily span the whole vector space C^. In such a case the matrix is said to be 
defective. 

The order-fc Krylov subspace of a matrix Q(y,x) on a certain starting vector rj(y) defined in Sec. 13. 6."21 
in Eq. (|3. 100(1 can be used for this purpose in the following way: Given the eigenvectors of Q(y,x), 
{Ci(2/)i ■ ■ ■ i which span a space of dimension dim{£i(y), . . . , £i(y)} = I < i, then choosing a vector 

i 

v(y) =Yl Ci &(y} > 

will result in a Krylov space whose dimension dim K (Q(y, x), rj(y)) can be at most I. Repeated appli- 
cation of Q(y, x) on the starting vector will yield for the fcth element of the Krylov space 

i 

^2Q k {y,x)r i {x)^Y, c ^^y')- 

x j=l 

This recipe will increase the projection of rj(y) on the eigenvector whose corresponding eigenvalue has 
the largest magnitude A max . Thus, in the limit k — > oo, the iteration will converge to the largest 
eigenvector. 

For a properly chosen starting vector rj(y) which has an overlap will all eigenvectors, the Krylov iter- 
ation will consequently yield the eigenvector corresponding to the eigenvalue with largest magnitude. 
Repeated application of this procedure with orthogonalization of the starting vector to previously found 
eigenvectors allows in principle to restore the complete spectrum. 

However, the straightforward application is quite cumbersome. In practice it has turned out to be more 
economical to compute the subspace of several eigenvalues from the border of the spectrum together 
and afterwards to determine the largest eigenvector from these iterates. One of the methods achieving 
this goal is called Arnoldi iteration |144j 3 . If more than a single eigcnvalue/-vector are required, this 
method is the most efficient way to determine the spectrum of a matrix. 

Once the eigenvalue(s) closest to the origin are known, one can also use this knowledge to simplify the 
inversion of a matrix using any of the algorithms discussed in Sec. 13.61 With N eigenvectors {£,i{y)} 
and their corresponding eigenvalues {A;} known, one can compute 

« w = ^-t&l& m ' (3 - m) 

2 As discussed in Sec. 12.6.41 the Wilson matrix does not share this property 
3 This algorithm is already coded in the ARPACK package and can be found in 
http : //www. caam. rice . edu/sof tware/ARPACK/ 
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and then use rj'(y) as a starting point for the inversion. The resulting inverse <p(y) is then given by 

M = E Q' 1 ^ *V (*) + E IfIwt^^ ■ ( 3 - 112 ) 

The problem to compute ^ Q~ 1 {y, x)r)'(x) may now have a significantly reduced condition number 
since the N smallest eigenvalues have been removed. For a highly singular matrix Q(y,x), the cost to 
compute the eigenvalues (which is independent of the condition number) may be lower than the cost 
for the complete inversion. However, it has been shown in |189| that for the condition numbers used 
in the SESAM project 11331 12271 [T%8| (which are similar or even higher than those considered in 
this thesis), this is not yet the case. 
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The main focus of this chapter is the optimization and tuning of multiboson algorithms with an emphasis 
on the TSMB algorithm introduced by Montvay 202 . The details of the algorithm have been discussed 
in Sec. 13.5.31 Throughout this chapter, the focus lies mainly on the survey of QCD with two degenerate, 
dynamical fermion flavors on various lattice sizes with fixed physical parameters given in Tab. 14.11 (for 
a precise measurement see |148) . also cf. 2251 1227] ). These numbers are an excerpt from Tab. 15. ll 



Bare parameters 


Physical parameters 




P 


K 




(amp) 


m^/mp 


a/fm 


2 


5.5 


0.159 


0.4406(33) 


0.5507(59) 


0.8001(104) 


0.141 



Table 4.1: Bare and physical parameters for most runs presented. 



In Sec. 14.11 the static aspects of the polynomial approximations are discussed. The question to be 
answered is how to choose the approximation of an inverse power of the Wilson matrix in the most 
efficient way if one recourses to a static approximation (cf. Sec. I3.6.T1| . 

Section FQ1 investigates the tuning of the dynamical aspects of multiboson algorithms. After a detailed 
presentation of the tools used for the efficiency analysis in 14. 2. II the practical application to an aspect 
of major importance, namely the dependence of the performance on the order n\ of the polynomial 
(|3.66[) is investigated in 14.2.21 The results presented here should be independent of the particular 
implementation of the algorithm and thus apply to other variants of MB algorithms apart from TSMB 
as well. The impact of reweighting is analyzed in Sec. 14. 2. 51 and finally the updating strategy is discussed 
in Sec. 14.2.^1 The updating strategy consists of the proper combination of local updating sweeps which 
make up a single trajectory. A trajectory is then the logical partition after which an iteration of update 
sweeps restarts. 

The practical implementations of multiboson algorithms are discussed in Sec. 14.31 The two major 
platforms, where the multiboson algorithm has been implemented are compared and performance mea- 
surements are presented. 

Section IP summarizes the results from this chapter. 



4.1 Optimizing the Polynomial Approximation 

In order to find the required approximations for the TSMB algorithm, one has to focus first on the 
behavior of the polynomial approximation in the static case. This regards the application of the inversion 
to a single gauge field configuration with known condition number and eigenvalue distribution. 
In the following, a particular thermalizcd gauge field configuration at the physical point given in Tab. l4.ll 
on an il = 8 4 lattice will be considered. The extremal eigenvalues and the condition number of the 
Wilson matrix Q 2 (y,x) for this gauge field configuration are given in Tab. 14.21 A histogram of the 
lowest 512 eigenvalues is shown in Fig. 14. II Figure fOl shows the corresponding histogram of the largest 
eigenvalues. As it is evident from these plots, the eigenvalue density is small at the lower and upper 
ends of the interval and increases towards the middle. 
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A mi „ 


A max 


A max /A mi „ 


5.4157 x 1(T 4 


2.2052 


4071.9 



Tabic 4.2: Extremal eigenvalues and the condition number of Q 2 (y, x). 




0.05 0.1 0.15 0.2 0.25 



Figure 4.1: Histogram of the 512 smallest eigenvalues of Q 2 (y,x). 



4.1.1 Tuning the Quadratically Optimized Polynomials 

The quality of the approximation provided by the polynomial 1)3.66)1 does not only depend on its order, 
but also on the choice of the interval, where it should approximate the function under consideration. 
Now the optimal choice of the approximation interval, [e, A], will be determined for a quadratically 
optimized polynomial introduced in Sec. I3.6.T1 Figure l4~3l displays the function 

A a P ni (A) 

of a quadratically optimized polynomial with n\ = 20, a = 1 and [e, A] = [7.5 x 10~ 4 , 3]. The quality of 
the approximation is best at the upper end of the interval, while already slightly above the upper limit 
it will soon become useless. At the lower end of the interval the approximation is worse, but the limit 
is not as stringent as in the former case. 

These observations fix the strategy for finding the optimal interval: The upper limit must be chosen 
very conservatively — large enough that during the simulation runs an eigenvalue never leaves this 
interval. In the following, the choice A = 3 will be adopted unless otherwise stated. The lower end 
may be chosen more freely, in particular it may be chosen larger than the smallest eigenvalue since 

20 1 1 1 1 1 1 1 



15- 




2.1 2.15 2.2 



Figure 4.2: Histogram of the 512 largest eigenvalues of Q 2 (y, x). 
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0,5 1 1,5 2 2,5 3 3,5 

it 



Figure 4.3: Test function A" 1 P ni =2o(ty for a quadratically optimized polynomial. 



the eigenvalue density is largest in the middle of the interval. Raising the lower limit will make the 
approximation for the smallest eigenvalues worse, but will increase the quality of the polynomial in the 
middle, where the majority of eigenvalues is located. 

Measures of Accuracy 

To find a measure for the quality of the polynomial approximation for a particular matrix (in this case 
the square of the Hermitian Wilson matrix, Q 2 (y, x), for the gauge field configuration discussed above), 
the following two definitions of matrix norms will be adopted: Consider the matrix defined by 

R n (Q 2 ) = l-Q 2a P n (Q 2 ). (4.1) 

Then the following two definitions of matrix norms will be used: 

1. Measure the vector norm of |£(x)| defined by 

(,{ y ) = Y,Rn{Q 2 ){y^)i 1 {x), (4.2) 

X 

where rj(x) is a Gaussian random vector with unit width. The average vector norm |£(a;)| = 
I Sx£( a OI f° r a sample of {77} will be denoted by \R n (Q 2 )\- 

2. Measure the expectation value 

(R n (Q 2 )) = ^^7 7 t( y )i?„(Q 2 )(y, 2 ;) ?? ( a; ), (4.3) 

rj(x) xy 

where n(x) is again a Gaussian random vector with width one. This quantity is not a norm, 
however. Since it is not positive definite the absolute value of (R n (Q 2 )) will be used in the 
following and will be denoted by ||i?„(Q 2 )||. 

These definitions can also be applied to the case of the inverse square root defined in Eq. i|3.76[l . This 
is done by replacing R n (-) by i?™^(-), which is defined by 

RZ(Q 2 ) = 1 - P n3 (Q 2 )(Pn 2 (Q 2 )) 2 ■ (4.4) 

In particular, H-ff^Q 2 )!! is the exponential factor in the noisy correction step of the TSMB algorithm, 
Eq. (|3.78|) . if the old configuration is chosen equal to the new one. 

When computing the matrix norms of R n for too small orders n, the fluctuations of the norms will 
be large. In particular, the matrix norm (|4.3() at small n will be close to (|3.59ll . i.e. the inverse of 
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the determinant of Q. The opposite limit n — » oo will correspond to the determinant itself. As has 
been discussed in Sec. 13.5.11 the fluctuations of (|3.59fl are huge, while those of its inverse are small. 
Thus, the fluctuations will decrease for increasing values of n. Therefore, the optimization of the static 
approximation should be performed for comparatively large orders. 

Fixing the Lower Limit 

As has been argued, it is of importance to have a recipe for fixing the lower limit of a quadratically 
optimized polynomial for a given order. First consider the choice ri\ = 20 for which the two matrix 
norms together with their standard errors are displayed in Fig. 14.41 for varying values of e. For each 
point a sample of 100 Gaussian vectors has been considered. While H-R20II displays a minimum at the 
lower end of the interval (where the smallest eigenvalue is located), |i?2o| stays more or less constant 
over a range of more than one order of magnitude. Thus, for small orders, one cannot rule out that a 
choice e S> A min is practical. 
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Figure 4.4: Norms |i?2o| and H-R20II vs - the lower interval limit e. 

Next consider the case n\ = 180 which should already provide a very good approximation to the inverse 
function. Figure l4~5l again shows the two matrix norms for varying values of e. The curve of |i?iso| 
clearly displays a minimum at e opt = 4.5 x 10~ 4 , which is about 20% smaller than A min . The curve of 
||i?iso|| shows a more or less continuous increase with larger errors. 



0.4 




0,001 0.002 0,003 



Figure 4.5: Norms |i?iso| and H-Risoll as defined in Eq. (|4.2|l vs. the lower interval limit e. 

Finally the situation regarding the third polynomial must be clarified. In general, the systematic 
error of a simulation run should be bounded to be much smaller than the statistical error of any 
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quantity measured. The magnitude of the error can be estimated by considering a noisy estimate for 
the determinant, Eq. I|3.78[) . with the old configuration being equal to the new one, i.e. 

{/' = {/. 

If the approximation was exact the acceptance probability would be equal to one. However, any de- 
viation in the exponential could cause spurious acceptances or rejections. Since any negative value in 
the exponential in (|3.78|) would cause the configuration to be accepted in any case, the case of large 
negative values of the exponential factor can be completely disregarded. On the other hand, for large 
positive values of the argument, the influence of any error on the acceptance rate will be minor due 
to the flat tail of the exponential function. Thence, the largest influence is to be expected for values 
around zero. 

To quantify the influence of this systematic error one can consider the following model for the exponential 
correction factor: 



E(<j, b, x) 



1 



2ir<r 



cxp 



(4.5) 



The resulting acceptance rate can be computed to yield 

/0 />oo 
dx E(<T, b, x) + / dx E((T, b, x) exp(- 
-oo JO 



1-Erf 



V2cr 



1 - Erf 



V2c 



exp 



-b + -^ 



(4.6) 



Using Eq. Ij4.6|) . one can compute the actual systematic error by measuring ct, b and R™HQ 2 ) in a given 
run and considering the resulting change in acceptance rates 



AP« C M, A6 = || K;(Q 2 ) ||) = |P acc (a,fe) - P m (<r,b- 



\Kl(Q 2 )W)\- 



(4.7) 



The resulting number is the systematic error for a single trajectory. As a rule of thumb one should not 
allow AP acc (<t, b, Ab) to exceed values of 1 x 10 -3 . In most situations, however, it is possible to make 
it as small as 1 x 1CP 5 . Any systematic error will then be negligible. 

Figure l4~Hl shows plots of the two norms with = 200 (the value ri2 = 160 has been chosen compatible 
to the situation discussed above) vs. the lower limit of the interval, e. Both norms obtain their minimal 
values at lower interval limits of e 7.5 x 10~ 4 sa 1.5 x A mi „. However, the important norm in this case 
is ||jR n3 (Q 2 )|| — already if e is varied by a factor of 2, one approaches a region where the systematic 
error may become significant. One has to keep in mind that in a dynamical simulation fluctuations may 
cause the smallest eigenvalue to become smaller than in the present case. Therefore, the interval for 
the third polynomial has to be chosen far more conservatively than for the other polynomials and the 
residual norm must be adjusted by increasing the order 713. In some cases, there is a different problem 
related to this strategy, cf. Sec. 14. 3. 31 below. 

One can conclude that the choice of the approximation interval for quadratically optimized polynomials 
can have a large impact on the quality of the approximation. While in the case of the first polynomial 
(where one deals with a comparatively small order), the choice of the lower limit only has a small 
impact, the situation changes as the order is increased. For orders as large as the second polynomial in 
the two-step approximation, the optimal choice for the lower limit is slightly smaller than the smallest 
eigenvalue of the matrix, while for the third polynomial, the choice of the lower limit should be made 
extremely conservative. The interval should always cover every single eigenvalue and ensure that AP acc . 
in Eq. I|4.7|) is sufficiently small for the algorithm to be free of systematic errors. 

In the case of a dynamical simulation, the choice of the interval should be guided by the average values 
of the smallest eigenvalue. However, this information is only rarely available prior to the run. It may 
therefore be necessary to readjust the polynomials during a run as more information becomes available. 
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0.003 



Figurc 4.6: Residual norms for R^ooiQ 2 ) vs - the lower interval limit e. 



In this way one can reduce the total runtime, but at the price of more effort and logistics. Appendix IdI 
gives a framework for handling this type of runs. 

In any case, one has to make sure that the third polynomial is sufficiently good by making a very 
conservative decision regarding the lower limit and making the order sufficiently large as to keep the 
systematic error bounded to at most a few percent. 



4.1.2 Algorithm for Polynomials 



As has been discussed in Sec. 13.5.31 different methods are possible for finding the polynomial, Eq. 
which approximates the inverse of the Wilson matrix. The quadratically optimized polynomials do not 
require explicit knowledge of the eigenvalue density in the approximation interval. On the other hand, 
the method proposed by de Forcrand in |22l)j requires a thermalized gauge field configuration to be 
available. As has been noted in the latter publication, taking a single gauge field configuration may 
already be sufficient since the results from several configurations are similar. 

Figure FQ1 already showed the deviation of a quadratically optimized polynomial. The GMRES algo- 
rithm discussed in Sec. 13. 6/31 will now be used to construct the polynomial dynamically on a (different) 
thermalized configuration. The resulting plot of A Q=1 P„ 1= 2o(A) is displayed in Fig. 14.71 For comparison 
the quadratically optimized polynomial from Fig. 14.31 is also shown. 




Figure 4.7: Polynomial X a=1 P2o(X) which has been obtained by applying the GMRES algorithm to a 
thermalized gauge field configuration together with the corresponding quadratically opti- 
mized polynomial. 



It is apparent that the quadratically optimized polynomial performs worse in the middle of the spectrum, 
where most eigenvalues are located. In contrast, the GMRES polynomial respects the spectral density of 
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the Wilson matrix and thus results in a better approximation. The disadvantage is that the underlying 
algorithm will become unstable when computing the coefficients of the polynomials for larger orders if it 
is only run on a machine with 64-bit precision. This instability will already become apparent for orders 
slightly beyond n\ = 20. A further problem is that usually one does not have a thcrmalizcd gauge 
field configuration for a particular physical point prior to the calculation. It is therefore necessary to 
perform the optimization process dynamically during the sampling and readjust the polynomials after a 
certain number of trajectories. Similar to the case of quadratically optimized polynomials, this requires 
more effort. 

The influence of the qualitative difference is displayed in Fig. 14.81 The GMRES results are compared 
to the quadratically optimized polynomials for a number of different orders. The polynomial interval 
for the quadratically optimized polynomials has been chosen to be [e, A] = [6 x 10~ 4 , 2.5]. The former 
choice clearly exhibits smaller residuals and is thus superior. However, since the computation has only 
been performed with 64-bit precision, the numerical instabilities are already visible at n± = 20. 
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Figure 4.8: Residual vector norm for both the GMRES and the quadratically optimized polynomials. 



This scheme can easily be extended to the case of any rational number a, i.e. any rational number of 
fcrmion flavors. However, the instability of this method will also grow as the number of multiplications 
required increases. Therefore, this method has not been applied in the following. This is a place where 
further research is in demand. One way to solve this problem would be to implement the polynomial 
algorithm using very high precision arithmetics, similar to what has been done in |229j . Another way 
could consist of using a different scheme which does not recourse directly to the expansion coefficients 
in Krylov space. 

4.1.3 Computing the Reweighting Correction 

When using the TSMB algorithm for the correction step as discussed in Sec. 13.5.31 one will perform 
a noisy estimate for the inverse determinant using a static inversion algorithm with the polynomial 
P n2 (x) . Any residual systematic error in the ensemble of gauge field configurations generated will then 
have to be repaired with a multicanonical reweighting. An observable will be computed using cither 
(l3~50l or euj. 

In order to find an efficient way to perform this reweighting, it is assumed that the configuration under 
consideration has been computed using the TSMB algorithm with action (j3.74|l , where the quadratically 
optimized polynomial (cf. Sec. l4.1.I|) P ni (x)P ri2 (x) ~ P ni+1l2= i%o(x) has been employed with the interval 
[e, A] = [7.5 x 10~ 4 , 3]. The overall systematic error from the simulation run is thus determined from the 
polynomial P\so(x) alone. The observable under consideration is now A = 1, i.e. the correction alone 
is being measured. The correction factor from the individual 512 lowest eigenvalues of Q 2 is plotted in 
Fig. 14.91 Obviously, the correction is mostly related to the lowest eigenvalue alone 1 . 

J The situation changes if the GMRES polynomials had been used since they also perform a worse approximation on the 
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Figure 4.9: Individual correction factors as computed using the 512 lowest eigenvalues of Q 2 (y,x) for 
the quadratically optimized polynomial Pi&o(Q 2 )- 
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Figure 4.10: Similar to Fig. 14.91 but with the cumulative correction factor from the 512 lowest eigenval- 
ues. 



This finding is confirmed when examining the convergence behavior of the correction factor with respect 
to the number of eigenvalues computed. The cumulative factor in Eq. i|3.81[l as a function of the number 
of eigenvalues taken into account is shown in Fig. 14.101 and gives an average value of 0.885. Although 
larger eigenvalues still introduce fluctuations, the major impact comes from the smallest eigenvalue 
alone. 

The alternative way to compute the correction factor is provided by the evaluation of l|3.80[) . From 100 
noisy vectors one observes that the approximation already has converged at order 714 = 500. The total 
correction factor from this method is 

( 1 >p„ 4 =5oo = 0.8783 ±0.0113, 
which is completely consistent with the value obtained from Fig. 14.101 

In conclusion, when estimating the correction factor on the basis of eigenvectors on an = 8 4 lattice 
alone, it makes sense to use only a small fraction (definitely less than 32) of the lowest eigenvalues. 
The fluctuations introduced from the larger eigenvalues do not have a significant influence on the total 
result. The evaluation of the correction factor using a fourth polynomial is a practical alternative and 
avoids having to compute a fraction of all eigenvalues. The only potential problem is that the smallest 
eigenvalue of Q 2 (y, x) may lie outside the interval of P„ 4 (-), which could result in incorrect results. This 
problem can again be controlled by choosing an extremely conservative lower limit e or even e = 2 . 

upper end of the interval, cf. Fig. 14.71 
2 As has been discussed in 12181 . the convergence will no longer be exponential in this case. Since the total runtime of 
the correction step is negligible compared to the whole run, this approach still appears to be justified 
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Once one considers larger lattices up to £1 — 32 x 16 3 and beyond, the eigenvalue approach may become 
too costly since the eigenvalue density increases linearly with the volume and consequently a larger 
number of eigenvalues need to be computed to cover an equivalent fraction of the spectrum. 

4.2 Tuning the Dynamical Parameters 

After the optimal matrix inversion using a non-adaptive polynomial for a single (or a limited set of) 
gauge field configurations has been found, there remains the task to examine the dynamical behavior of 
these approximations. This is a question of major interest for the practical implementation of any multi- 
boson algorithm, since after all one is interested in using the approximations in a dynamical updating 
process. It may happen, that fluctuations of the eigenvalue density may temporarily cause eigenvalues 
to run out of the approximation interval. This can have a dramatic impact on the performance of the 
algorithm. It is therefore of considerable importance to assess the size of these fluctuations and what 
impact they could have on a simulation run. It is important to notice that these aspects may still be 
explored on rather small lattices since they will exhibit larger fluctuations and will thence show a larger 
sensitivity to these vulnerabilities. 

4.2.1 Practical Determination of Autocorrelations 

Before proceeding further, the tools must be prepared to compute the primary measure of efficiency in 
the dynamical case, namely the autocorrelation time of a time series. Since the aim of any simulation 
algorithm is to generate statistically independent gauge field configurations with minimal effort, the 
autocorrelation time is the key monitor for the cost determination of a particular algorithm. The 
theoretical bases of methods to compute autocorrelations of time series have been laid in Sec. 13.21 The 
purpose of this section is to apply them to two different time series obtained from actual simulation 
runs. 

In the first case, the series has a low fluctuation and is sufficiently long for the autocorrelation time 
to be measured. The second situation is less suitable: the time series exhibits large fluctuations and a 
rather large autocorrelation time. Furthermore, it shows a contamination of a very long mode which 
introduces fluctuations on a time scale comparable to the length of the scries itself. This mode appears 
to be separated from the other modes contained in the series. Given the fact that the total lattice size 
is given byl~ 1.128 fm (cf. Tab. l4.l|) . one may suspect that the simulation is already very close to the 
shielding transition, see also Sec. 16.11 This could explain the observed behavior and the presence of the 
long-ranged mode. This mode contaminates the results and unless it is possible to perform simulations 
on a series at least two orders of magnitude longer, no statement can be made about its length. In this 
case it will become evident, that the lag-differencing method as discussed in Sec. 13.2.51 is still able to 
extract information from the series although the other methods fail. 

Case I: Low Fluctuations 

This time series has been taken from a simulation run using the physical parameters displayed in Tab. l4.ll 
on an fl = 32 x 16 3 -lattice. The algorithm employed is the HMC algorithm with SSOR-preconditioning. 
The molecular dynamics integration algorithm is the leap-frog scheme with a time step of At = 1 x 10~ 2 
and a trajectory length of n MD = 100 ± 20. The resulting acceptance rate is 71.6%. 
The total size of the sample consists of 4518 trajectories, from which the leading 1000 trajectories have 
been discarded. The complete time scries is given in Fig. 14.111 

Figure 14". 1 21 shows the normalized autocorrelation function (dotted curve), together with the integrated 
autocorrelation time (blue curve) as a function of the cutoff. The windowing procedure discussed in 
Sec. 13.2.51 has been applied with c — 4, 6 resulting in the green and red lines, respectively. From the 
c = 6 line, one can read off an integrated autocorrelation time of r lnt = 11.28 ± 0.43. The dashed- 
dotted line displays the maximum of the curve, which is clearly compatible with the c = 6 window. 
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Figure 4.11: Plaquette history of HMC run. 



A particular problem is already visible in the behavior of the normalized autocorrelation function. It 
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Figure 4.12: Normalized autocorrelation function and integrated autocorrelation time vs. the cutoff for 
the HMC plaquette history. 

does not approach zero exponentially (as one would expect), but appears to reach a plateau above 
zero, before it suddenly drops. The curve is not compatible with zero at this point since it is almost 
two standard deviations too high. This is a typical case of a linear bias mentioned in Sec. 18.2.51 The 
lag-differencing method which will be applied below is able to handle this situation. 
Next, the variance is estimated using the Jackknife method (cf. Sec. 13.2.5(1 . Figure EL 131 shows the 
variance as (Plaquette) as a function of the bin size B. The variance reaches a plateau (red line) at 
er(Plaquettc) f» (1.41 ± 0.53) x 10 -9 which yields the true variance of the plaquette. The result for 
B = 1 is given by a B =i (Plaquette) = 7.53 x 10 -u . Applying Eq. l(3~3Ujl now yields r mt « 9.36 ± 3.52, 
which is slightly below the result from the previous methods, but with a much larger uncertainty. It 
must be realized that this procedure should only give a rough estimate of the "true" value of r int , see 
|148j for a thorough discussion. 

Finally the lag-differencing method (cf. Sec. 13. 2. 5j) is applied to the time series. As a first step, the order- 
l-lag-30-differenced series, -D ; _3q (Plaquette), is computed using the definition 1(3.26(1 . It is displayed in 
Fig. 14.141 As the next step, the correlation between the plaquette and D^J is being computed, cf. ((3.27(1 . 
The normalized correlation function, T . , n (i> .s(t), together with its integral as a function of the cut-off 

_ A,(i^ 30 A) 

is shown in Fig. 14.151 The former is given by the dotted curve, while the latter is visualized by the blue 
line. The windowing method proposes the values r int = 5.94 ± 0.34 (green line) and r int = 10.26 ± 0.55 
(red line) for windows of c = 4 and c = 6, respectively. The maximum of the autocorrelation function, 
however, is reached at r int = 12.11 ± 0.48. It is obvious that there is no significant improvement from 
the differencing prescription and that the resulting function T . . (i> . . (t) is not compatible with a single 

A,{±J 30 A) 

exponential mode, just like the original function r AA(t) was not. Furthermore, the maximum of the 
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Figure 4.13: Variance ctb (Plaquette) vs. the bin size B using the Jackknifc method for the HMC pla- 
quette history. 



curve has shifted to the left and the windowing prescription does no longer give the maximum at c = 6. 
In fact, the differencing prescription impairs the statistics if the lag is large compared to the "true" 
autocorrelation time. 
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Figure 4.14: Order- l-lag-30 differenced series of the HMC plaqucttc history. 



To obtain a better result from the lag-differencing method, one has to repeat the procedure leading to 
the estimate for r int using a series of different lags and look for the stability of results. As long as the 
lag / stays above the autocorrelation time, no physical modes should get lost. Once the autocorrelation 
time obtained becomes as large as or larger than the lag /, one may cut off physical modes. Thus 
in accordance with the discussion in Sec. 13. 2.31 — one would look for a plateau at some intermediate 
values of I, where the autocorrelation function should exhibit an exponential behavior. 
The results from this analysis are displayed graphically in Fig. 14.161 Indeed, one finds a plateau reaching 
from I m 22 up to I w 26 giving rise to r int = 10.96 ± 0.39. The self-consistency criterion r int > / is 
clearly met. The question arises, whether the differencing prescription does indeed result in a correlation 
function where the linear bias is suppressed. To address this question, Fig. 14.171 shows the correlation 
function for the case I = 23. Now the function indeed decays to zero for already a short value of the 
cutoff, but still increases later on. This may be no exponential mode, but a polynomial mode giving 
rise to a higher order bias. Although in theory one could get rid of this bias by considering a higher- 
order differencing scheme, the impact of this procedure on the quality of statistics would invalidate this 
approach pretty soon. In particular, if the quality of the series was good enough to allow for higher-order 
differencing, the impact of the bias would be significantly smaller in the first place. 
The lesson from this investigation is that the linear bias can be removed by applying the lag-differencing 
prescription and the result obtained in this way is consistent with the one obtained from the original 
autocorrelation function and from the Jackknife method. The analysis shows that for the HMC with 
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Figure 4.15: Correlation function r . „(i> A dt) together with its integral vs. the cutoff. 
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Figure 4.16: Integrated autocorrelation time as obtained from the lag-differencing method for varying 
lags I. 



dynamical fermions one has to use a time series with a length of at least 4000 trajectories to gain 
accurate information about the true autocorrelation behavior. 



Case II: Large Fluctuations 

The second series has been obtained from the history of the average plaquettes using the TSMB algo- 
rithm discussed in Sec. 13.5.31 The simulation has been performed using the same physical parameters 
as in the previous case except for the volume and the algorithmic parameters given in Tab. 14.31 The 
first polynomial order was n\ = 20. This run is part of the tuning series discussed in Sec. 14. 2. "21 below. 
Rewcighting of the observables has been neglected, since this would have introduced another source of 




Figure 4.17: Similar to Fig. 14.151 but with a differencing lag I = 23. 



90 



4.2 Tuning the Dynamical Parameters 



autocorrelation effects, see |230) for a discussion. The total length of the series was 51196 trajectories, 
where the thcrmalization phase has already been subtracted. 
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Figure 4.18: Time series of the average plaquctte from the TSMB algorithm. 




As has already been pointed out, a very long mode possibly related to the shielding transition is present 
in the series which cannot be examined in a time series of such a length. However, since this mode does 
not appear to have any connection to the other, short-range fluctuations, it may be questioned if it has 
any significance for the efficiency considerations of the multiboson algorithms. 

As in the previous case, first the autocorrelation function is visualized together with the corresponding 
integral in Fig. 14.191 The estimated autocorrelation time is r int = 1897 ± 135. Again, the problem 
is that the autocorrelation function does not go to zero exponentially and that it appears to reach a 
plateau, before it drops to zero and then decreases linearly This behavior may indicate a linear bias, 
which could be removed by the lag-differencing method. 



2000 




Figure 4.19: Autocorrelation function and the corresponding integral as a function of the cutoff for the 
plaquette history from the TSMB run. 



The variance obtained from the Jackknife analysis for different bin sizes is shown in Figure 14.201 
The plateau can be estimated to lie at about er(Plaquette) « (5.855 ± 2.422) x 1CP 7 . Together with 
<tb=i (Plaquctte) = 2.032 x 10~ 10 one obtains an estimate of r int w 1441 ± 596. This number is com- 
patible with the previous estimate, however it should not be trusted since the original time series was 
contaminated with the mode too long to be reliably examined. 

Finally, the lag-differencing method has to shed some light on the behavior of the autocorrelation time. 
Figurc E.21l displavs the results from measuring the integrated autocorrelation time with various lags. If 
the long-range mode is indeed separated from the other modes, one should be able to see a plateau from 
the other modes after the long-range mode has been cut out. There is a clear signal for the formation 
of such a plateau at lags between I = 600 and I = 800. Using the error bars from the single points and 
making a linear fit yields r int = 334.3 ± 65.6. This result is about a factor of four below the previous 
estimates. 
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Figure 4.20: Jackknife variance calculated for different bin sizes for the time series from the TSMB run. 
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Figure 4.21: Integrated autocorrelation time vs. the differencing lag for the TSMB run. 



In conclusion, the lag-differencing method allows to get rid of a linear bias and thus enables the evalua- 
tion of a series with large fluctuations. The stability criterion is met, i.e. the estimated autocorrelation 
time exhibits a plateau when plotted as a function of the lag /. The self-consistency criterion is also 
met, i.e. the resulting value for T int is larger than the current value of I used. A large autocorrelation 
mode associated with a fluctuation that is expected to vanish with increasing volume, has successfully 
been cut off. In the following section, a series of these runs will be presented, which are all identical 
except that a single parameter has been changed during all runs. 



4.2.2 Acceptance Rates vs. Polynomial Approximation Quality 

It has already been argued that the number of boson fields enters linearly into the autocorrelation time 
of a multiboson algorithm. On the other hand, one can expect that a small number of boson fields 
gives rise to a small acceptance rate and thence to an increase in the autocorrelation time again for 
small numbers of fields. The number of boson fields and thus the first polynomial order is of critical 
importance for a multiboson algorithm. Until today, however, no systematic analysis of this effect has 
been performed and the impact of this choice on practical simulations is unclear. This is certainly related 
to the fact that any systematic analysis is exacerbated by the requirement to measure autocorrelation 
times with a reasonable accuracy. Therefore, we base our study on very long runs. Beyond that, we 
employ the efficient tools described in detail in the previous section. 

The algorithmic parameters shared by all runs are displayed in Tab. 14.31 Only the order of the first 
polynomial, ni, has been varied. 

Table l4~4l shows the statistics generated together with the acceptance rates of the noisy correction step 
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ni 


n 2 


n 3 


M] 


Updates/Trajectory 


var. 


160 


200 


[7.5 x 10- 4 ,3] 


1 boson HB, 3 boson OR, 2 gauge Metropolis, 1 noisy corr. 


Volume = 8 4 



Table 4.3: General algorithmic parameters for high-statistics TSMB runs. 



and the cost of a single trajectory. These runs have been performed on the ALiCE computer cluster 
installed at Wuppertal University 3 . The machine configurations were both a single node configuration 
(with no parallelization) and a four-node partition with the lattice parallelized in z- and t-direction. 
The local lattice size was consequently Q loc = 4x8x8x4. The numerical efforts are given for the latter 
situation. It is specified in terms of a multiplication by the preconditioned fermion matrix Q 2 (y,x) 



m 


Number of confs. 


Acceptance rates 


Numerical effort/MV-Mults 


12 


101111 


8.00% 


463.0 


18 


62462 


39.61% 


499.6 


20 


61196 


51.51% 


511.8 


22 


42248 


57.94% 


524.0 


24 


49704 


64.56% 


536.2 


26 


50684 


69.45% 


548.4 


28 


50412 


74.46% 


560.6 


32 


50238 


80.84% 


585.0 



Table 4.4: Runs for the parameter tuning of the TSMB algorithm. 



with an arbitrary colorspinor n(x). Since the TSMB algorithm uses non-adaptive polynomials in the 
noisy correction step, the number of explicit matrix-vector multiplications is straightforwardly given by 
n-2 + n^. In the case under consideration we thus have n% + n.3 = 360. To estimate the total effort we 
assume that the efficiency of the implementation for the local algorithms is roughly equivalent to the 
efficiency of the matrix- vector multiplication routine 4 . Thcnccfrom, we measure the time needed for a 
complete trajectory, t tlai , and the time needed for the noisy correction alone, t naiBy . Using these times, 
we can define the total effort E MV _ 

■ mults 

E MV . multa = (na + n 3 ) . (4.8) 

tnoisy 

Behavior of the Correction Factor 

As a first step, the dependence of the acceptance rate on the magnitude of the exponential correction 
factor exp (— Ci2{C/ id, C^now}) should be clarified. Figure B.22I shows the exponential correction factor 
together with its standard deviation. It depends exponentially on the order n\. 
The function approximating the average value is given by 

[exp (-C12)] (n x ) = A ■ exp (-B ■ m) , 

A = 78.657, B = 0.23092. (4.9) 

3 See http://www.theorie.physik.uni-wuppertal.de/Computerlabor/ALiCE.phtml for technical details and further in- 
formation 

4 This assumption is only roughly valid leading to a machine- and compiler-dependence of the effort defined in this way. 

See Sec. 14. 3. "31 for a thorough discussion 

5 Actually the total time may fluctuate due to the load of the whole communication. Therefore, the results in Tab. 14.41 

have been averaged 
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Figure 4.22: Dependence of the exponential correction factor on the number of boson fields, n\. 



This behavior is in line with the expectations that the convergence of the first polynomial is exponential. 
On the other hand, the standard deviation of the exponential correction does not follow a precise 
exponential dependence; this is shown in Fig. 14.231 

4i 1 1 i 1 1 1 i 1 1 1 i 1 
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Figure 4.23: Dependence of the standard deviation of the exponential correction factor on the number 
of boson fields, n\. 



Restricted to the intermediate regime, nevertheless an exponential function yields a good fit to the data 
points: 

([ejq)(-Ci 2 )]>(ni) = A' ■ exp (-B' ■ m) , 

A' = 6.6765 , B' = 0.075757 . (4.10) 

Finally, inserting the models l|4.9|l and (|4.1U|I into Eq. 1)4.611 allows to check their validity by comparing 
them to the measured acceptance rates from Tab. 14.41 Table 14.51 compares the predicted and the 
measured acceptance rates. The numbers are obviously in perfect agreement. 

In conclusion, the exponential correction factor shows an exponential dependence on the order m. Its 
standard deviation also approximately follows an exponential decay. Therefore, the acceptance rate can 
be predicted as a function of the polynomial order n\ once at least two points have been determined 
which allow to fit the functions (|4.9[1 and l|4.10|l . 

Fermionic Energy 

First consider the fermionic action S { . This quantity is not affected by the correction step, since in the 
trajectory in Tab. 14.31 the correction step may only reject an update of the gauge field. However, it is 
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57.94% 


24 


66.84% 


64.56% 


26 


71.59% 


69.45% 


28 


75.45% 


74.46% 


32 


81.37% 


80.84% 



Tabic 4.5: Comparison of acceptance rates as a function of n\ from the predictions of Eq. 1)4. 6|) and 
from the actuals runs in Tab. 14.41 



still linked to the full dynamics of the system by its coupling to the gauge field. Thus, it is expected 
to display a linear dependence on the number of boson fields, ri\. Figure B.24I shows the integrated 
autocorrelation times r int (^f) versus the polynomial order, n\. The autocorrelation times have been 
measured using the windowing prescription from Sec. l3.2.5l on the integrated autocorrelation function. 
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Figure 4.24: Integrated autocorrelation time r int (S f ) vs. the number of boson fields n\. 
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One finds a linear dependence on the number of boson fields. Certainly, the small absolute values of 
the fcrmionic autocorrelation times help to make the measurement very precise. However, the fcrmionic 
energy does not directly give rise to any useful physical information. Rather quantities computed 
directly from the gauge fields (like the plaquette) give rise to physical information about the system. 
Since the "fermionic force" on the gauge field is directly related to the boson fields, one can nonetheless 
expect the influence on the autocorrelation time of gauge- field related quantities to be linear in m. 
Unfortunately, the situation is far more involved in that case. 

Gauge Field Plaquette 

Since the plaquette is a purely gauge-field dependent quantity, its autocorrelation time will be affected 
by the acceptance rate. It can be expected to increase at too small n\ because of the correlations caused 
by identical configurations. Furthermore, as already noted above, the plaquette contains a strong noise 
and is thus very difficult to be measured. The desired behavior will therefore be embedded in huge 
fluctuations. A standard analysis of the effect is therefore bound to fail. It is in this situation, where 
the lag-differencing method becomes important and provides a useful source of information. 
The integrated autocorrelation time as a function of the differencing lag is shown in Fig. 14.251 for all 
available values of n\. The errors are larger than in the case of S t since the autocorrelation times are 
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now larger and thus the statistics for this observable is worse. The left diagram in the second row has 
already been discussed in Sec. 14. 2. Jl Fig. 14.211 

In the case n\ — 12 a plateau is clearly visible, indicating that the lag-differencing yields a stable 
solution. In the case n\ = 18 the situation is less clear. A pseudo-plateau may be suspected around 
I = 600, but in general the method is unstable and the result should be disregarded. The case n\ = 20 
has been discussed in Sec. 14.2.11 while ri\ = 22 is again very stable with a plateau determined around 
I = 600. Absolutely nothing can be learned from ri\ = 24; there is no plateau and obviously any attempt 
to find one is futile. For n\ = 26 a clear plateau is again visible, starting at about I = 650. Regarding 
m = 28 a plateau can be found from I = 400 to I = 600. For m = 32, again no result can be found. 
For comparison the Jackknife procedure as discussed in Sec. 13.2.51 has been applied to all samples. The 
variances vs. the bin sizes are displayed in Fig. 14.261 The straight lines give the plateau values. Again, 
the first graph in the second row is identical to Fig. 14.201 in Sec. 14.2.11 

The resulting integrated autocorrelation times read off from Figs. 14. 231 and l4~2~6l are given in Tab. 14.61 
Figure |4~. 271 displays these results graphically. 

The conclusion is that there is no measurable increase in the autocorrelation time as n\ is increased if one 
relies solely on the lag-differencing method. The increased acceptance rate from larger n\ compensates 
for the loss of mobility in phase space. From this point of view it appears reasonable to simulate at 
comparatively small acceptance rates. 

The Jackknife method exhibits a similar behavior, but it is compatible with a decrease of the autocor- 
relation time with increasing n\. There is no indication, however, that this decrease exceeds a factor of 
two, see the cases n\ = 12 and n\ = 32. In fact, it is very likely that there is a non-trivial dependence 
(which results in differences of the order of a factor of two, but not much larger) which will become 
visible if one performs the same runs with far larger statistics. This is impossible with current com- 
puter technology, and would not be worth the effort given the fact that its influence is so small. The 
conclusion is therefore unchanged. 



ri\ 


r,„ t from LDM 


Lag I 


r lnt from Jackknife 


12 


288.0 ±20.5 


400 - 800 


885 ± 679 


18 


320.1 (?) 


600 (?) 




20 


334.3 ±65.6 


600 - 800 


1441 ± 596 


22 


174.3 ± 15.3 


600 - 700 


203 ± 54 


24 






680 ± 333 


26 


344.3 ±73.7 


650 - 1000 


779 ± 335 


28 


212.9 ±29.6 


400 - 600 


409 ± 390 


32 






386 ± 89 



Table 4.6: Integrated autocorrelation times of the plaquettc together with their standard errors as read 
off from Fig. 14351 
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Figure 4.25: Integrated autocorrelation times of the plaquette together with their standard errors as a 
function of the differencing lag for various n±. 
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Figure 4.26: Jackknife variances as a function of the bin size for various values of ri\. 
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Figure 4.27: Integrated autocorrelation times of the plaquette vs. n\. 



4.2.3 Dynamical Reweighting Factor 

Of particular importance is the stability of the polynomial approximation with respect to the interval 
[e, A] chosen for the TSMB algorithm. To access this problem, three runs have been performed with 
different choices for the lowest limit e. The physical parameters were again chosen according to Tab. l4.ll 
on an ft = 8 4 lattice using the TSMB algorithm with polynomial order ri\ = 20 and parameters as in 
Tab. 14.31 apart from the value of e. The lower limit has been varied to be e = 4.5 x 1CP 4 in the first, 
e = 6.0 x 10 -4 in the second, and e = 7.5 x 10~ 4 in the final case. These polynomials arc visualized in 
Fig. 14.281 for the lower end of the approximation intervals. 
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Figure 4.28: Polynomials -P ni (A) • P„ 2 (A) ~ P„ 1+ „ 2 (A) for the three different values of the lower limit. 

The histories of the reweighting factors measured during the runs are displayed in Fig. 14.251 They have 
been computed from the lowest 32 eigenvalues (cf. Sec. I4.1.3"|l . 
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Figure 4.29: Reweighting factors for the three values of e. 
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Figure 4.30: Accumulated eigenvalue histograms for the 32 lowest eigenvalues of Q 2 for the three runs. 
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In the first case, ei = 4.5 x 1CP 4 , the correction factor shows the largest fluctuation of all cases: 
Ai = 1.006990 ± 0.028267. For the second case, ti = 6.0 x 10 -4 , one obtains a correction factor of 
A2 = 1.005144 ± 0.016978. In the third case with €3 = 7.5 x 10~ 4 the correction factor drops down 
to 0.75 meaning that the lowest eigenvalue left the region where the polynomial approximation is good 
enough; although this appears to be a problem, it only happens at a single place and never repeats. If 
this outlier is not disregarded, the correction factor is found to be A3 = 0.997461 ±0.021883. Dropping 
the first 500 trajectories from the sample — which is certainly justified, since it makes up only 5% of the 
total runtime — one arrives at A3 = 0.999619 ± 0.012296. When computing any observable, however, 
the configurations must not be disregarded, since the algorithm would otherwise not be ergodic. If it is 
included in the measurement via Eq. (|3.80|l . the algorithm will be both exact and ergodic. 
As it is apparent from Fig. 14.281 fsee also Fig. I4.3f) . the correction factor oscillates for the contributions 
from the smallest eigenvalues when using the static inversion of the TSMB algorithm. Consequently, 
an eigenvalue which is located in a "valley" will contribute with a smaller factor than an eigenvalue 
located on a local maximum. The static inversion of the TSMB algorithm may prefer to accumulate the 
eigenvalues in the local minima compared to the distribution sampled by an adaptive inverter. Since 
the distributions from the adaptive inversion (or from the reweighted static inversion, respectively) 
do not know about the existence of the oscillations, one may suspect that this distortion of phase 
space introduces notable stochastic forces. If these forces were present, they could result in larger 
autocorrelation times since motion of the eigenvalues between separate valleys would be suppressed. 
Figure 14.301 shows the histograms of the 32 lowest eigenvalues from the three histories of the runs, 
computed every 100 trajectories. 

There does not appear to be any correlation between the eigenvalue fluctuations and the shape of the 
polynomials. As a result, one can say that the static matrix inversion employed in the TSMB algorithm 
does not result in significant stochastic forces. Furthermore, it is possible to make a choice of e guided 
by the optimal lower limits found in Sec. 14.1.11 If the lowest eigenvalue ever leaves this interval, the 
correction factor will suppress these configurations. On the average, it will be smaller (with all other 
parameters fixed) than any choice with smaller e. The magnitude of the reweighting factor is then 
controlled by the choice of n2- Since a large fluctuation of the reweighting factor impairs the statistics, 
the value of 712 should be chosen such that the reweighting factor is always below the statistical error. 
This choice ensures that its influence is so small that the statistical quality of the sample is not perturbed 
too much. Alternatively, one can also compute the determinant norm |ji?((5 2 )|j given by Eq. I|4.3|l for 
the second polynomial and keep its error below the statistical one. The latter procedure is simpler and 
still gives a good handle of the quality attained. 

On the other hand, one should refrain from making 712 far too large and the lower limit e too small — 
this choice will simply increase the computer time required and make the algorithm inefficient. 

4.2.4 Updating Strategy 

After the recipe for choosing the polynomials and their orders are fixed, finally the focus is placed on 
the tuning of the updating algorithms for sampling a new configuration (the transition matrix V in the 
Markov process). The available algorithms have all been discussed in detail in Sec. 13.41 

Boson Field Updates 

The boson field updates should be performed using a global heatbath (cf. Sec. 13. 4. 2|) to achieve optimal 
decorrelation. This fact is already known theoretically from |196) and has been confirmed numerically 
in |221| . The global updates may, however, be accompanied by local updates, see e.g. |197) . 
Prior to a simulation run, the boson fields should be thcrmalizcd. This can best be achieved by 
holding the gauge field configuration fixed and updating the boson fields only. Observing the efficiency 
of thermalization methods also allows to shed some light on the best combination of local updating 
algorithms. Figure |4~.3 II shows a history of the fermionic energy, S u starting from a random boson field 
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configuration. They display the effect of local overrelaxation sweeps to the thermalization. The runs 
have been performed using an Q = 4 4 lattice. The physical parameters have been chosen as in Tab. 14.11 
and the polynomial order has been chosen to be n\ = 60. This makes no sense for a production run 
on this lattice size, but since the local boson field updates factorize, there should be no dependence on 
the number n\ chosen. The interval of the polynomial has been chosen to be [e, A] = [7.5 x 10 _4 ,3]. 
A trajectory always consisted of a local boson heatbath update and either 0, 1 or 3 local boson field 
overrelaxations. It is obvious, that local boson overrelaxations improve the thermalization rate and 
thus should also be expected to decrease the exponential autocorrelation time. 



i 

— overrelaxations 

— 1 overrelaxation 
3 overrelaxations 
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Trajectory 

Figure 4.31: Boson field thermalization using local boson field updating algorithms. 

Of course, in practical implementations, one should not use the local updating algorithms for the 
thermalization, but instead directly use the global boson heatbath. 

Gauge Field Updates 

For the gauge field updates one can either use a local Metropolis algorithm fSec. l3.4.T|> . a local heatbath 
(see Sec. 13.4. 2|) . or local overrelaxations fSec. 13. 4/3")) . It turns out that after a certain number of gauge 
field updates has been applied the acceptance rate of the noisy correction step stays essentially constant. 
To demonstrate this behavior, a simulation run at the physical parameters given in Tab. 14.71 on a lattice 
with fl = 8 4 has been performed. This run is a part of the investigations performed in Sec. 16.3.11 
Table |4~81 shows the algorithmic parameters in this study. The only parameter varied is the number of 
gauge field Metropolis sweeps, where a Metropolis algorithm with eight hits per single link has been 
used. 



N| eB 


P 


K 


3 


5.3 


0.150 



Table 4.7: Physical parameters for the investigation of the gauge field updating sequence. 



ni 


n 2 


n 3 


M] 


Updates/Trajectory 


24 


100 


140 


[1 x 10-^,3] 


2 boson HB, 6 boson OR, var. gauge Metropolis, 1 noisy corr. 


Volume: ft = 8 4 



Table 4.8: Algorithmic parameters for the investigation of the gauge field updating sequence. 



The resulting values of the exponential correction together with their standard deviations and the 
corresponding acceptance rates are shown in Tab. 14.91 
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0.663 


1.125 


59% 


4 


1.000 


1.421 


48% 


6 


1.350 


1.371 


41% 


8 


1.188 


1.523 


52% 


12 


1.120 


1.429 


48% 



Tabic 4.9: Exponential correction with standard deviation and the corresponding acceptance rates as a 
function of the number of gauge field Metropolis sweeps. 



The average exponential correction increases slightly with an increasing number of sweeps between the 
noisy corrections. However, this increase is accompanied by a slight increase in the standard deviation. 
Already after six sweeps have been performed, the changes can be attributed to fluctuations. The net 
effect is that the acceptance rate does not vary more than 10%. Hence, one finds that indeed the 
acceptance rate will saturate once a certain amount of updates has been applied to the gauge field. 
Therefore it is possible to choose a rather large number of gauge field updates between the noisy 
corrections. 

Choice of Updating Sequence 

During a single trajectory, all d.o.f. of the system need to be updated. Therefore, a trajectory always 
consists of a certain number of boson field updates and a certain number of gauge field updates followed 
by a noisy correction step. The optimal sequence, however, might also depend on the architecture used 
for the configuration sampling. The reason is that the ratio of local to global update sweeps itself 
depends on the architecture used, see Sec. l4.3l bclow for details. 

The acceptance rate is only slightly influenced by the number of local gauge field sweeps, as has been 
shown in Sec. 14. 2. 21 Hence, it is obvious that the noisy correction does not contribute to any reduction 
of the autocorrelation time while the update sweeps do. Consequently, one should always keep the 
number of local updates in a trajectory as large as possible, such as to minimize the contribution of the 
noisy correction step to the total runtime. 
Now two different kinds of trajectories are proposed: 

• Perform a number of boson field updates, followed by a number of gauge field updates and a 
correction step. 

• Perform an alternating sequence of gauge and boson field updates prior to a correction step. 

Since the correction step only depends on the gauge field configuration, but not on the boson fields, 
a rejection in the first proposal does not imply the necessity of restoring the boson field configuration 
since it has always been obtained in the background of a "valid" gauge field "background" configuration. 
Conversely, the second scheme will have to restore both the old boson and gauge field configurations in 
case the update is rejected. Hence, the memory requirements of the second proposal will be significantly 
larger than in the first case. 

These considerations do not yet fix the optimal mixture of gauge and boson field updates. A very 
simple proposal is to use a single local gauge overrelaxation sweep followed by a single local boson 
overrelaxation sweep. The overrelaxation sweeps allow for a very fast movement through phase space, 
but do not ensure ergodicity, see Sec. 13.4.31 So this sequence has to be complemented by at least one 
ergodic heatbath and/or Metropolis sweep. This simple sequence turns out to be quite efficient in 
practice, see Sec. 15.2.21 for an application. 

However, given the fact that the fermionic energy decorrelates much faster than the gauge field plaquette 
(see Sec. 14.2.21 above ~). one might hope that subsequent updates of the gauge field alone might still 
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decrease the plaqucttc autocorrelation, although essentially only a small subset of all d.o.f. is being 
updated. The reason why this could be efficient is, as will be shown below in Sec. 14.3.41 that the 
caching of the boson field contributions as proposed in App. [U]in Eqs. (|C.14() and l|C.15|l allows for a 
larger number of subsequent gauge-field-only update sweeps. On the other hand it is clear that this 
effect will very soon lead to a saturation since the gauge field will thermalize with the fixed boson field 
background. 

The question when this saturation occurs can only be answered in a practical simulation. The physical 
parameters have again been chosen from Tab. 14. II and the algorithmic parameters are given in Tab. 14.101 
They are identical to the parameters given in Tab. 14.31 



ni 


n 2 


n 3 


[6, A] 


20 


160 


200 


[7.5 x 10~ 4 ,3] 




Volume: 


fi = 8 4 



Table 4.10: Polynomial and lattice volume for runs with different updating sequences. 

Table l4~TTl shows the different sequences used for a number of simulations and the number of trajectories 
computed together with the total cost in MV-Mults for a single trajectory. The machine configuration 
used for all runs was an eight-node partition of the ALiCE computer cluster; parallelization was used 
in the z- and ^-direction resulting in local lattices of fi loc = 2x8x8x4. 





Updating strategy 


Trajectories 


Cost/MV-Mults 


Sequence I 


2 boson HB, 6 boson OR, 
8 gauge OR, 1 noisy corr. 


71400 


1208.15 


Sequence II 


1 boson HB, 3 boson OR, 


68400 


1403.77 


Sequence III 


16 gauge OR, 1 noisy corr. 
2 boson HB, 16 boson OR, 
8 gauge OR, 1 noisy corr. 


31400 


1631.09 


Sequence IV 


1 boson HB, 3 boson OR, 

2 gauge Metro, 1 noisy corr., 

2 gauge OR, 1 noisy corr. 


40300 


1123.14 



Table 4.11: Different updating sequences used for a single trajectory. 



Sequence I corresponds to an intermediate number of boson sweeps and gauge field sweeps. For the 
gauge field local overrelaxations have been used. Hence, ergodicity is ensured by the boson field heatbath 
only. Sequence II applies a small number of boson field sweeps, but a large number of gauge field sweeps. 
Sequence III applies a large number of boson field updates and an intermediate number of gauge field 
updates. Sequence IV consists of a small number of boson field updates and only an intermediate 
number of gauge field updates. The latter run makes direct contact with the run in Sec. El at 
Tlx = 20. It will be denoted by "Sequence 0" . 

In the following, we will not only consider the autocorrelation times in terms of trajectories, r lnt , but 
also the efficiencies of the algorithms, E indov . These efficiencies are defined as the number of MV-Mults 
required per statistically independent gauge field configuration, 

£ lndop = 2£W multa r lnt , (4.11) 

where £7 MV _ mult3 has been defined in Eq. (|4.8|l and r int is the integrated autocorrelation time of the 
observable under consideration, in this case the plaquette. Thus, the quantity E iadop is a measure 
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for the total cost which is, to a large extent, independent of the technical details of the underlying 
algorithm. 

The resulting plaquette autocorrelation times have been computed using both the lag-differencing and 
the Jackknifc method. Figure 14*. 321 shows the results for the former method. For Sequence I one finds 
that a plateau emerges beyond I = 500, while for Sequence II it starts already around I = 400. Sequence 
III exhibits a stable region between I — 350 and I — 600, while Sequence IV becomes steady beyond 
I = 700. Hence, in all cases, the differencing method was able to yield conclusive results. 



250 
200 



50 



Sequence I 



150- / f -I-H 1 

r 



/ 



200 400 600 800 



Sequence II 



250 
200 
150 
LOG 
50 



200 400 600 



250r 

200- 

150 

100 



Sequence III 



Sequence IV 



250 
200 
150 
ion 
5(1 



u 


1 1 ■ 1 

-[ 1 1 H 1 1 1 1 I 


J 

.1.1. 





200 400 600 800 1000 1200 



Figure 4.32: Integrated autocorrelation times of the plaquette together with their standard errors as a 
function of the differencing lag for the different update sequenced. 



The Jackknife variances as a function of the bin size are displayed in Fig. 14.331 In all cases, plateaus 
can be identified. Note again, that there is no systematic control of the errors when using this method. 

The resulting autocorrelation times and the total numerical efforts (as computed from the lag-differ- 
encing method) are summarized in Tab. 14.121 The data for Sequence has been taken from Tab. 14.61 

When examining the integrated autocorrelation times one finds that indeed some gain can be achieved 
by increasing the number of local gauge field sweeps. However, the effect clearly saturates already after 
as few as four consecutive sweeps have been performed in Sequence IV. On the other hand, increasing 
the number of boson field sweeps in Sequence III did not produce any practical gain. 
In contrast, the picture is different if the total efforts are considered. Apparently a decrease in the 
autocorrelation time is accompanied by an increase in the effort for a single trajectory. From this 
finding one can conclude that one should better mix the local update sweeps. There is, however, one 
subtlety which is not reflected in Tab. l4~T2l As it will be shown later in Tab. l4~T4l on the APE-100 
architecture a caching of the boson fields allows for an efficient implementation of subsequent gauge 
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Sequence I 




Sequence III 




500 1000 1500 



Sequence II 



1000 2000 3000 4000 5000 



Sequence IV 



1000 2000 



Figure 4.33: Jackknifc variances as a function of the bin size for the different updating sequences. 



field sweeps. This would reduce the total cost in Sequence IV compared to Sequence by the amount of 
three gauge field sweeps. The net effect of such an implementation is that the cost would only slightly 
increase for the execution of subsequent gauge field sweeps. Hence, one would find that for the APE- 
implementation an updating sequence like Sequence IV is superior to the sequence used previously. 

In conclusion, the optimal updating sequence consists of a mixture of local gauge and boson field sweeps. 
One can update the gauge fields for several sweeps while holding the boson field background fixed. It 
does not appear to be efficient, however, to perform more than four gauge field sweeps in this way. 
In case the caching discussed in App. O is available, this method is indeed effective in reducing the 
total effort for a single trajectory, -E M v-muit s , and hence also the total cost for a statistically independent 
configuration, E^ A „ V . 





r int from LDM 


Lag 1 


T illt from Jackknife 


Total effort/MV-Mults 


Sequence 


334.3 ±65.6 




1441 


342189 


Sequence I 


180.5 ±8.3 


> 550 


611.8 


436070 


Sequence II 


141.4 ± 10.5 


> 400 


318.0 


397098 


Sequence III 


152.8 ± 10.1 


350 - 600 


109.8 


498461 


Sequence IV 


175.2 ± 11.7 


> 700 


129.7 


393503 



Table 4.12: Integrated plaquette autocorrelation times from the lag-differencing and the Jackknife meth- 
ods together with the total costs for a statistically independent configuration. 



107 



4 Tuning of Multiboson Algorithms 



4.3 Implementation Systems 

The discussion so far has been limited to the machine- independent part of multiboson algorithms. 
In practical simulations, however, a particular architecture for the large-scale simulations has to be 
selected. This choice will have a considerable impact on the project since the complexity of a multiboson 
algorithm is huge compared to other algorithms in use today and the program is expected to run for 
several months. 

For the purposes of this thesis, two platforms have been given major focus: The first platform was the 
APE-100 platform |25T] which is also compatible with the APE-1000 |252] architecture 6 . The machines 
used are installed at DESY/Zeuthen and at the Forschungszentrum Jiilich in Julich, Germany. The 
second target platform for the implementation was the ALiCE computer cluster |233j installed at 
Wuppertal University. In the following, the machines together with their specific merits and drawbacks 
will be presented. The properties of the implementations are discussed and the influence of the numerical 
precision on the calculations is examined. 

4.3.1 APE Platform 

The APE is a SIMD machine which executes a program in parallel on a number of nodes arranged 
in a three-dimensional mesh. The smallest configuration of nodes is a 2 x 2 x 2-partition, the largest 
configuration available is an 8 x 8 x 8-machine. 

The APE-100 architecture can only execute single-precision floating point numbers efficiently in paral- 
lel. However, for special operations like global sums, a library for double-precision addition is available 
|234| . Given that a global sum only consumes a fraction of the total runtime of a program, there is 
no performance degradation to be expected. In contrast to floating point operations, integer calcula- 
tions are being done globally on a single CPU (with no parallclization possible) with less efficiency. 
Especially integer operations on array indices should be kept at a minimum for the program to run 
efficiently. One further obstacle is the fact that the complete multi-boson program would be too large 
to fit into the memory of the machine — thus only a portion of the complete code can be written on 
the APE machines and the remaining parts must be run on conventional parallel computers. 
One further problem is the bad I / O-performance of the machine. A save /restore of the complete machine 
state requires about 1-2 hours of time (for a typical lattice of size VI = 32 x 16 3 and ni = 30 — 60) which 
means that about 5 — 10% of the whole runtime of a job (which is usually about 20 — 30 hours) would 
be wasted for I/O operations. This problem can be overcome, however, by not storing the boson fields 
on disc, but rather performing a global heatbath thcrmalization sweep (see Sec. 14.2. Ijl to initialize them 
prior to a run. This strategy is more efficient (and also is more effective in terms of autocorrelation 
times of observables) than saving and restoring the complete machine state each time. 
The compiler and optimizer technologies lag behind the industry standards of conservative parallel 
computers — the CPUs have no caches (only the registers of the floating point processors serve as a 
kind of 1st level cache). Most optimization strategies (like loop unrolling, prefetching etc.) have to be 
implemented manually using the high-level language of the platform. This language is called TA0 |235| 
and is a language build on Zz, which is a compiler construction language. However, due to the fact 
that Zz is still accessible (to extend the features of TA0 and to implement manual loop-unrolling etc.), 
the system is effectively using a dynamic grammar, which is known to bear a lot of responsibility on 
the implementor. The drawback is that more complicated programs developed on the machine cannot 
easily be ported to different architectures and thus the maintenance costs will soon become a reasonable 
factor. This is no concern for trivial algorithms like the HMC, but will become a serious problem once 
a larger source code base is to be established on the machine. 

One particular problem is that the sources of the compilers are not publicly available, meaning that bugs 

6 Furthcr material on these machines can also be found on the web under http://chimera.romal.infn.it/ape.html 
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are hard to locate and fix compared to e.g. the GNU compilers 7 . This implies that the development 
tools could not be run on modern and fast machines — the typical compile times during the early 
phases of the project were of the order of 20 — 30 minutes resulting in turn-around times of more than 
half an hour. 

The advantages of the APE architecture are that a lot of computer time is available and it has proven 
to be the ideal platform for simple algorithms like the HMC, which essentially rely only on the imple- 
mentation of an efficient matrix-vector multiplication. Furthermore the platform scales very efficiently 
since the communicational overhead is minimal. This results in a rather small latency and is thus a 
counterpoint to the workstations clusters available today sec 236 for a different application of work- 
station clusters which demonstrates the same properties. Several of these shortcomings have improved 
with the advent of the APE-1000 architecture |232| . but experience is still too sparse to include major 
results in this thesis. The APE-1000 architecture still has problems regarding the maximum machine 
size and the fact that double precision calculations will introduce a performance hit of a factor of four 
in the peak performance. 

The first implementation of the TSMB algorithm used in this thesis has been written on the Q4open 
machine located at the NIC in Jiilich, Germany 8 . The machine had a configuration of 2 x 4 x 4 nodes 
and served as the major development platform until Spring 2000. Sadly, it went out of service due to 
a defect board. 

4.3.2 ALiCE Cluster 

At a later stage of this project, development was shifted to the ALiCE computer cluster installed at 
Wuppertal University 9 , where modern compilers and development tools are available. Most results have 
in fact been obtained on this machine. The cluster consists of 128 Compaq DS 10 workstations, each 
equipped with a 21264 Alpha processor running at 616 MHz. The size of the second level cache is 2 
Mbyte. The network is based on a Myrinet network with a peak performance of 1.28 Gbit/s. 
The coding done on this platform was immediately usable on other parallel machines, like the CRAY 
T3E located at the ZAM, Jiilich 10 and the Nicse-clustcr which is also located at the NIC institute. 
The program has been proven to run also on a cluster of standard, Intel-based workstations installed 
at Wuppertal University. The network of these machines is based on standard Ethernet which made 
the installation not competitive from a performance point of view, but very attractive for development 
and debugging purposes. This illustrates the particular advantage of standard tools over proprietary 
solutions: although the hardware costs might be smaller (the situation might be less clear once develop- 
ment costs are included, however), the Total Cost of Ownership (TCO) may outweigh the former price. 
In fact, the total costs for maintenance and software may become larger than the pure hardware costs. 

4.3.3 Accuracy Considerations and Test Suites 

The complexity of code for the multiboson algorithm is high compared to the case of other algorithms 
in use today like the HMC. The multiboson code on the APE machine (together with the production 
environment) amounted to more than 11000 lines of code; the program on the ALiCE cluster consisted 
of 17000 lines of code (for the single-node and the parallel version) and the administrative software 
required another 17000 lines. For the measurement of hadronic masses, a program with a size of 29000 
lines of code was required. This clearly asks for having efficient test suites available to track down 
possible sources of errors. 

7 Further information and resources related to this system can be found under 

http : //www. gnu. org/sof tware/gcc/gcc .html 
8 See http://www.fz-juelich.de/nic/ for further information on the John von Neumann — Institut fur Computing 
9 A large contribution to this program has been provided by Prof. I. Montvay, DESY, Hamburg 
10 The official homepage of the Central Institute for Applied Mathematics can be found at 

http : //www. f z-juelich. de/zam/ 
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Local Fermionic Action 

An important part of the program consists of the implementation of the fermionic action. Explicit forms 
of the different expressions required for the local action are given in App. [U] The local forms have to 
be consistent with the implemented matrix- vector multiplication 11 . Then one can alter a single link at 
an arbitrary site and compare the results of Eqs. Ij3.69|l . ljC.160 and l|C.17|l . The second test consists of 
changing a single color-spinor with an arbitrary index j, 1 < j > < m, and again comparing the results 
of EIHHl and (|CT6)l . The residual error should only be limited by the machine precision. This can also 
act as a test on whether the single precision of the APE machines is a real limitation. 
In fact, for the application of l|C.17|) one has to sum up n\ terms in single precision to get a complex 
3x3 matrix which may introduce already difficulties at moderate values of n\. To examine the errors as 
they occur in practical computations, one can already get along with a very small lattice since the major 
source of numerical errors occurs in the local update part. Therefore, a simulation has been performed 
using a thcrmalizcd configuration on a = 4 4 lattice at the physical parameters given in Tab. 14. li on 
the QHl-board at DESY/Zcuthcn. The polynomial in question has been chosen to be m = 32 with 
[e, A] = [3 x 10 -3 , 3]. The maximum numerical error in the three expressions is displayed in Figure 13. 341 
where the distribution of the inaccuracies are shown. They have been obtained by considering separately 
each site on a single node during a gauge field updating sweep. The error obviously is bounded from 
above and only scarcely exceeds 1 x 10~ 6 . In the development phase such a plot turned out to be 
extremely useful since identifying the sites which give a huge numerical error can help to track down 
program bugs rather easily. 




Numerical inaccuracy (32-bii) 

Figure 4.34: Maximum numerical error between the different implementations of the fermionic actions 
for local changes of the gauge field. 



The second message to be learned from Fig. 14.341 is that the 32-bit precision used is not an obstacle 
in actual simulations: The systematical error introduced by the local gauge field updates is obviously 
under control. 

The same can be done for local changes of the boson field and considering the expressions Eqs. H3.69J1 
and HC.16|) . The corresponding results are displayed in Fig. 14.351 Apparently, the same can be said 
about the boson field case as has been stated before in the gauge field case. 

The Inverse Square-root 

Another problem is posed by the residual error of the inverse square root required for the TSMB 
algorithm, Eq. (|3.76|) . As has already been discussed in Sec. l4.1.Tl the systematic error can be computed 
via H4.4f> and must be less than one percent. 

11 Strictly speaking, this is not a necessity for the program to be correct. One can implement the global matrix-vector 
multiplication Q(y,x) with another convention than that used for the local actions. However, in this case the tests 
suggested here will fail. Thus, it appears to be a good idea to keep the actions consistent and proceed as discussed 
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to 




Figure 4.35: Maximum numerical error between the implemented actions for local changes of the boson 
field. 



The question arises, how accurate the approximation can be at best. Given the fact that in case of 
the HMC algorithm one has a residual error of 2% if one uses 32-bit floating point numbers on an 
SI = 40 x 24 3 lattice (see Sec. I3.5.2|) . the question arises, how large the lattice may be in the TSMB 
case if one only has access to single precision on a particular architecture. This question is answered 
by Fig. 14.361 where the residual error ||fl^g(<9 2 )|| of the noisy correction step is plotted vs. the order 
of the third polynomial, rc.3 (the polynomial rc-2 has been chosen to be 712 = 160). The calculation has 
been performed using both 32-bit (single precision) and 64-bit (double precision) algebra. The lattice 
sizes which have been considered were SI = 8 4 and SI = 32 x 16 3 . 
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Figure 4.36: Residual error of the noisy correction ||-R™3 _160 (<3 2 )|| vs. the number of iterations 713 for 
two lattice sizes. Both single (32 bit) and double precision (64 bit) have been used. 



On the SI = 8 4 lattice the results coincide up to orders of about 713 = 200. Beyond this point, the 
single precision result deviates from the double precision curve. Finally, the single precision numbers 
saturate at > 250. The accuracy which can be reached is still satisfactory since it is bounded by 
1 1 (0 2 ) 1 1 ~ 2.1 x 10 -4 . Hence, one can conclude that single precision is adequate on an SI = 8 4 lattice. 

On the SI = 32 x 16 3 lattice, the single precision result saturates already at H-R^Q 2 )!! ~ 4.6 x 10 -3 . 
This is more than on the SI = 8 4 lattice, but it is still sufficiently small. The double precision curve 
also shows a saturation, but not before rc.3 = 400 and an accuracy of ||-R™3(Q 2 )|| ~ 2.1 x 10 -5 has been 
achieved. Again, this is completely acceptable. 

The conclusion to be drawn from this test is that single precision arithmetic is not at all a problem on 
an SI = 8 4 lattice, and is still acceptable on lattices as large as SI = 32 x 16 3 . On larger lattices, however, 
single precision may no longer be feasible and one should refrain from using 32-bit arithmetics. 
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4.3.4 Architectures and Efficiency 

All architectures discussed so far have certain advantages and disadvantages. In this section, the 
efficiencies of the implementations are compared against each other. Table 14.131 shows the execution 
times of the different parts of the multiboson implementations for a selection of architectures. The 
physical parameters of the runs are given in Tab. 14.11 while the algorithmic parameters are listed in 
Tab. 14.31 with ni = 20. The lattice size was again il = 8 4 . 

In case of the ALiCE cluster and the CRAY T3E, an eight-node partition with parallelization in z- 
and t-direction has been employed which results in local lattices of O loc = 2x8x8x4. Table 14.131 
displays the execution times of the different algorithms employed. The last line shows the ratio of local 
update sweeps to the global matrix-vector multiplications taking place in the noisy correction step. This 
is the key quantity of interest, where the influence of a particular architecture is most clearly exposed. 



Algorithm 


Time on ALiCE 


Time on CRAY T3E 


Mat-Vect. mult 


0.04938s 


0.06542s 


Noisy correction 


17.777s 


23.551s 


Boson hcatbath 


1.10352s 


4.02112s 


Boson overrel. 


1.06641s 


3.79800s 


Gauge heatbath 


1.17969s 


4.12083s 


Ratio local/global 


0.3748 


1.0045 



Table 4.13: Execution times of several parts of the TSMB algorithm on the ALiCE cluster and the 
CRAY T3E. 

For the APE-100, an eight-node Ql board has been used with a lattice size of f2 = 4 4 . The local 
lattices were f2 loc = 2x2x2x4 per node. The resulting execution times of these algorithms are 
quoted in Tab. 14.141 In contrast to the above implementation, the APE program uses an efficient 
caching strategy, where the contributions of the boson fields to the local staples, Eq. (|C.17J) . which are 
unchanged by a gauge field sweep are held in memory, see App.Ofor details. This allows to reduce the 
computational costs for repeated local gauge field sweeps. The time required for the initialization of the 
gauge field sweep is given in the fifth line of Tab. 14.141 This strategy has only been implemented on 
the APE, but in principle this caching scheme is machine-independent and could also be implemented 
in the other program. 



Algorithm 


Time 


Mat-Vect. mult 


0.01354s 


Noisy correction 


4.876s 


Boson heatbath 


1.634s 


Boson overrel. 


1.482s 


Init gauge sweep 


1.695s 


Gauge heatbath 


0.124s 


Ratio local/global 


1.6503 



Table 4.14: Execution times of the parts of the TSMB algorithm on an APE board. 

When comparing the ALiCE and the CRAY T3E, one realizes that the CRAY T3E architecture has a 
more efficient network, but lacks the large second-level caches of the ALiCE nodes. This explains why 
the communication-intensive performance of the matrix-vector multiplication is very efficient on the 
CRAY T3E. On the other hand, the local update sweeps are very cache intensive and communication 
between nodes only plays a minor role. Therefore, the local update sweeps contribute a much smaller 
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fraction to the total runtime on the ALiCE cluster, while they account for about 50% of the total 
sweep time on the T3E. 

The APE-100 architecture shows an even more prominent dominance of the local update sweeps, which 
consume about 2/3 of the total runtime. This can be attributed to the fact that the CPUs have no 
second-level cache at all and the data has to be fetched from memory each time. Therefore the local 
update sweeps are rather inefficient, while the global matrix- vector multiplications are very efficient. 

4.4 Summary 

The TSMB algorithm requires three approximations to an inverse power of the Hermitian Wilson matrix. 
All approximations are being performed with static inversion algorithms as discussed in Sec. 13. 6. II The 
first is a crude approximation to an inverse power of Q 2 with order ri\. The second is a refined 
approximation to the same function with order n-i . The third one approximates the inverse square root 
of the polynomial with which the second approximation is performed. 

The best method to find the first polynomial consists of applying the GMRES algorithm (see Sec. 13.6.3^1 
to one or more thermalized gauge field configurations at the physical point one is interested in. This 
method is limited by the numerical precision of the architecture used to generate the polynomial. The 
other method consists of using a quadratically optimized polynomial. The latter choice only requires 
rough knowledge of the spectral bounds. 

The second and third polynomials have to be quadratically optimized polynomials. The value of e for 
the product of the first and the second polynomial should be chosen such that it is slightly smaller than 
the average smallest eigenvalue. Its order has to be adjusted such that the rewcighting factor is not 
fluctuating more than a few percent. Since the convergence is exponential, this can be achieved without 
too much effort. 

The third polynomial must have a sufficiently high order such that its corresponding determinant norm, 
Eq. (|4.4|l . or rather the resulting systematic error, Eq. I|4.7[) . is never exceeding values of ~ 10~ 3 . 
The order of the first polynomial influences the acceptance rate of the correction step. It appears to 
be safe to make the acceptance rate somewhat smaller than 50%. The motion in phase space depends 
linearly on the number of boson fields. The decreased acceptance rate counteracts the increased mobility 
in phase space and the resulting efficiency does not appear to depend on the acceptance rate. Since the 
numerical cost of a single trajectory is proportional to the number of boson fields, the total cost for a 
statistically independent gauge field configuration, Eq. (|4.11() . is then given by 

£»de„ oc n\ . (4.12) 

This formula should apply for runs at different parameters and different orders n\ if the acceptance 
rates are held constant and if the field updates dominate the time needed for a single trajectory. 
A trajectory is given by the updating sequence, i.e. the transition function of the Markov chain. It 
consists of a number of update sweeps for the boson fields and local update sweeps of the gauge field. 
It has turned out that the boson field updates should be mixed with the gauge field updates, but a 
number of subsequent gauge field updates allows a similarly efficient decorrelation. Each trajectory is 
completed by a noisy correction step. In view of the fact that the acceptance rate is rather independent 
from the local gauge field updates, the optimal efficiency can be achieved by performing a larger number 
of local gauge field updates between two noisy corrections. Thus, the sequence should be arranged in 
such a way that the field updates dominate the total runtime. 

Furthermore, it is important to identify the machine architecture which meets the specific demands of 
MB algorithms. While a conventional massive-parallel machine with a network similar to the CRAY 
architecture, but small caches on the nodes does not perform well with respect to the local update 
sweeps, its efficient network allows for a rather efficient matrix-vector multiplication. However, as it has 
been discussed in Sec. 14.2.41 one can (and, in fact, one should) always arrange the updating sequence 
in such a manner that the local updating sweeps dominate the total runtime of the code. Hence, the 
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machine best suited for TSMB calculations is found to be a cluster of workstations with large cache 
and standard programming tools. 

The APE-100 system does not perform very well in the local update sweeps and suffers from the 
problem that the coding of the algorithm can until now not be used on any other machines. The former 
problem may be overcome with the advent of the APE-1000 architecture which may (due to the large 
number of CPU registers) reduce the number of memory accesses required. The latter problem can not 
be expected to be solved before the advent of the APE- Next platform |237| . Due to the complexity 
of the algorithm it appears reasonable to implement first a reference implementation in a standard 
language on a different architecture before starting with the coding on the APE platform. 
There is still room for further improvements, however. In particular, improving the approximation 
scheme of the third polynomial would allow to overcome the limitations of the current implementations 
and should make the algorithm applicable to larger lattices even with single precision arithmetics. A 
different place where further improvements are in order is the first polynomial. Although the optimal 
way to find its coefficients has been identified, this method still requires high numerical accuracy of the 
implementation system. 
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In the first section, Sec. 15. II the variant of the multiboson algorithm with TSMB correction step, which 
has been studied in detail in Chapter 0] is applied at different physical locations in parameter space. 
At all points, results from the HMC method arc available. This allows for a direct comparative study. 
Section l5~2l directlv compares the efficiencies of MB algorithms with that of the HMC. This investigation 
is not limited to the MB-variant discussed so far, but also covers the non-Hermitian variant with UV- 
filtering and gauge field overrelaxation as proposed by de FORCRAND in [2201 (cf. Sec. l3~B~5jl . These 
results might be of importance for future simulations of gauge theories with dynamical fermions, see 
e.g. [238 . The study is carried out on equal lattice volumes and with identical physical parameters. 
Hence, it will allow to probe the scaling of the algorithms as the continuum limit is approached. 



5.1 Simulation Runs at Different Parameters 

The TSMB variant of the multiboson algorithm is applied to situations at different physical points in 
parameter space with two dynamical fermion flavors. A direct comparison with the HMC algorithm is 
given. The latter acts as a benchmark for the alternative proposals. 

The HMC simulations have all been carried out on volumes = 32 x 16 3 . The physical parameters of 
the various runs performed here and in the rest of this chapter are compiled in Tab. I5.ll and have been 
taken from |148j . see also [2281 1227) . The second line is identical to Tab. 14. II in Chapter B] 



Bare parameters 



Physical parameters 





P 


K 


(am.) 


K) 


m,/m p 


a/fm 


2 


5.5 


0.158 


0.5528(40) 


0.6487(55) 


0.8522(95) 


0.166 


2 


5.5 


0.159 


0.4406(33) 


0.5507(59) 


0.8001(104) 


0.141 


2 


5.5 


0.160 


0.3041(36) 


0.4542(78) 


0.6695(138) 


0.117 


2 


5.6 


0.156 


0.4464(27) 


0.5353(42) 


0.8339(66) 


0.137 



Table 5.1: Bare and physical parameters for the comparison between HMC and TSMB. Also cf. Sec. 14. II 



The multiboson algorithm has been operated on rather small lattices with volumes il = 8 4 and ft = 
16 x 8 3 . Measurements of the physical masses can be expected to differ from those on the larger volumes 
due to finite-size effects |148| . Hence, runs on such small lattice sizes can only be preliminary studies 
which need to be supplemented later by runs on larger volumes. The quantity under consideration here 
is the average plaquctte. This observable will exhibit only a weak dependence on the lattice volume. 
These runs corroborate the tests performed in Sec. 14.3.31 of Chapter 0] 

Table I5~2l gives the total statistics in numbers of trajectories entering the analysis. The complete data 
set as given in Tab. 14.41 has been exploited, therefore the statistics in this particular case is enormous. 
Furthermore, the machines on which the data have been sampled are shown. 

Table IB~3l lists the resulting values for the average plaquette together with their standard errors for the 
different algorithms used. The standard errors have been obtained using the Jackknife method on the 
plaquette time series. 

The plaquette values obtained with the HMC coincide with those generated by the TSMB algorithm 
up to three digits 1 . 

1 Thc residual deviation is caused by finite-size effects. However, it may also indicate that the autocorrelation times are 
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Bare parameters 










p 


ft 


Volume H 


Machine 


Algorithm 


Trajectories 


5.5 


0.158 


8 4 


ATJCE cluster 


TSMB 


17703 


5.5 


0.158 


32 x 16 3 


APE-100 QH4 


HMC 


3042 


5.5 


0.159 


8 4 


ALiCE cluster 


TSMB 


448055 


5.5 


0.159 


32 x 16 3 


APE-100 QH4 


HMC 


4518 


5.6 


0.156 


16 x 8 3 


APE-100 Q4 


TSMB 


4424 


5.6 


0.156 


32 x 16 3 


APE-100 QH2 


HMC 


4697 



Table 5.2: Machines and statistics for the test runs at different physical parameters. 



Bare parameters 








K 


Plaquette/HMC 


Plaquette/TSMB 


5.5 


0.158 


0.55546(6) 


0.55380(41) 


5.5 


0.159 


0.55816(4) 


0.55909(27) 


5.6 


0.156 


0.56988(2) 


0.56865(86) 



Table 5.3: Average plaquette values with their standard errors obtained from the different samples. 



In conclusion, the TSMB implementation indeed produces identical plaqucttcs. This comparison demon- 
strates the correct implementation and the correct execution of the MB algorithm. 

5.2 Efficiency of Multiboson Algorithms 

In this section, three different algorithms for the simulation of Lattice QCD with two dynamical fermion 
flavors are compared. The physical parameters are the ones given in the second line of Tab. 15.11 In 
all cases, the lattice volume has been chosen to be f2 = 32 x 16 3 . This allows for a measurement of 
hadronic masses and opens the stage for a direct comparison of both algorithms. 

The HMC algorithm has been introduced in the previous section, see Tab. 15.21 The two variants of 
MB algorithms used arc the implementation with quadratically optimized polynomials discussed in the 
previous section Sec. 15. H and an implementation based on the UV-filtered non-Hcrmitian approximation. 
The latter code has been written by M. D'Elia and Ph. de Forcrand. Both programs have been 
implemented on the APE-100 QH4 installed at DESY/Zcuthcn. While the former variant uses 
heatbath sweeps for the gauge field updates (it will be called "MB-HB" in the following) and the 
TSMB correction step, the latter uses overrelaxation sweeps (in the following abridged with "MB-OR") 
for the gauge field and an exact correction step. To reflect the different updating strategies used, the 
algorithms are named after the corresponding local gauge field updates. 

5.2.1 Tuning the MB-HB Algorithm 

For the MB-HB algorithm, the question arises how the acceptance rate of the correction step changes 
with the volume and what the consequences for the polynomial orders are. Table l5~4l shows two different 
choices of parameters and the corresponding acceptance rates. Thus, the choice n\ = 60 gives an 
acceptance rate of about 50%, while for smaller values of rii the acceptance rate is decreasing rather 
fast. We observe a significant volume dependence since in the case f2 = 8 one only needs n\ = 20 

underestimated and the actual errors of the plaqucttcs arc still larger 
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to get similar acceptance rates (consult Tab. I4.4|l . For the simulation run to be presented below the 
parameters from the second line have been taken. 



Algorithmic parameters 



ni 


n 2 


n 3 


MJ 


Acc. rate 


42 


160 


250 


[7.5 x 10~ 4 ,3] 


15.2% 


60 


150 


250 


[6 x 10" 4 ,3] 


49.4% 



Table 5.4: Different polynomial parameters and the resulting acceptance rates for the MB-HB algorithm. 

The updating strategy is shown in Tab. 15.51 and is chosen similar to Tab. 14.31 in the previous chapter. 
Note, as has been discussed in Sec. 14.2.41 the boson fields do not have to be restored if the correction 
step rejects a proposed gauge field configuration. 



Updates/Trajectory 

1 boson HB, 3 boson OR, 2 gauge heatbath, 1 noisy corr. 



Table 5.5: Updating sequence for the MB-HB algorithm. 

The heatbath algorithm (cf. Sec. 13.4.2")) has been employed for the gauge field updates. It is important 
to notice that a single- hit heatbath algorithm is sufficient if the scheme from |157] is used. In fact, 
acceptance rates exceeding 99% have been observed when generating the distribution for ao (consult 
Sec. I3.4.2l for the notation). 

In the version of the program employed, the contribution of the unit-submatrix from the even points in 
the noisy vectors has been included in the noisy correction step. This resulted in a systematic error of 
about 2.48% when applying Eq. 1)4.7)1 . If this had not been done, one could have reduced the polynomial 
orders tt-2 and n^. We do expect this to influence neither the stochastic averages nor the autocorrelation 
times in terms of sweeps, however. 

5.2.2 Tuning the MB-OR Algorithm 

In contrast to the former variant, the other MB algorithm does not make use of a polynomial approx- 
imation in the correction step but makes an adaptive inversion. As has been discussed in Sec. 13.5.31 
this requires a nested iteration of an adaptive inversion and the polynomial P ril , which acts as a pre- 
conditioned This approach has the great practical advantage that no multicanonical rewcighting for 
the measurement of observables is necessary, but has the shortcoming that one has an increased effort 
once configurations with exceptionally small eigenvalues arc encountered. In the present case, however, 
we do not expect this to have a major influence. 

The power of the GMRES polynomials in conjunction with UV-filtering for the non-Hcrmitian Wilson 
matrix is demonstrated if one considers the order of the polynomial P ni {') required to arrive at an 
acceptance rate of 60.3%. The polynomial needed in this case has an order of only ri\ = 24. 
Thus, the number of boson fields could have been reduced by a factor of 2.5 (and even more if one 
aims for an acceptance rate of about 50%). This number takes into account the combined effect of 
using the non-Hcrmitian Wilson matrix, employing the expansion of Eq. 1)3.95)1 , and using the GMRES 
algorithm instead of quadratically optimized polynomials. To actually find this polynomial, however, a 
thermalizcd gauge field configuration had to be provided from the HMC run. Had this configuration not 
been available prior to the run, the run would have had to be performed with a non-optimal polynomial 
instead for thermalization. This would have increased the total investment into the algorithm. 
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Updates/Trajectory 

1 gauge OR, 5 x ( 1 boson OR, 1 gauge OR ), 

1 boson field global quasi-HB, 
5 x ( 1 gauge OR, 1 boson OR ), 1 gauge OR, 
1 noisy corr. 



Table 5.6: General algorithmic parameters for MB-OR run. 



The precise update sequence for a single trajectory is given in Tab. 15.61 In contrast to the former 
multiboson implementation discussed in Sec. 15.2.11 only overrelaxation sweeps (see Sec. 13.4.3(1 have 
been used for the gauge field. Although this algorithm alone is non-ergodic, ergodicity is ensured by 
the boson field global quasi-heatbath (this method has been discussed in Sec. 13.4. 21) . In particular, 
instead of only two gauge field updates between a correction step, in total 12 gauge field updates are 
being run. However, the mixing of gauge and boson field updates requires to restore both kinds of 
fields in case the correction step rejects the current configuration. This results in much larger memory 
requirements. As has been argued in Sec. 14.2.41 one can expect that this updating sequence results in 
a faster decorrelation than the updating sequence in Tab. 15.51 

In conclusion one can expect that the MB-OR implementation may perform better since both the 
number of boson fields is reduced significantly and the updating sequence ensures a faster decorrelation. 

5.2.3 Direct Algorithmic Comparison 

The observables under consideration were the average plaquette, the (non-singlet) pseudoscalar meson 
mass (denoted as pion 7r) and the (non-singlet) vector meson mass (denoted as rho-meson p). Their 
expectation values (for the three different algorithms) together with their standard errors are shown in 
Tab. 15.71 The hadronic masses have been taken from Orth |148| . 



Algorithm 


Trajectories 


Configurations 


Plaquette 


(anv) 


K) 


HMC 


3521 


140 


0.55816(4) 


0.4406(33) 


0.5507(59) 


MB-HB 


5807 


108 


0.55819(6) 


0.448(10) 


0.578(17) 


MB-OR 


6217 


177 


0.55804(7) 


0.4488(37) 


0.5635(83) 



Table 5.7: Average plaquette and hadronic masses for the three different sampling algorithms for Lattice 
QCD used. 

The plaquette values agree within errors, while the meson masses agree within at most two standard 
deviations. The statistics for the MB-HB algorithm is worse than in the other cases. 
Table I5"8l shows the resulting total efforts as defined in Eq. I|4.11|l for the three algorithms employed. 
The quantities under consideration are the meson masses. The efforts have been computed by Orth 
in |148| using the Jackknife method. See also |239j for the latest results. 

Finally, the plaquette is investigated. The time series for the HMC method at these physical parameters 
has already been examined in Sec. 14.2.11 Figurcs l5~Tl fthis figure is identical to Fig . 14 . 1 2JI . 15 . 2l and l5 . 3l show 
the autocorrelation functions and the corresponding autocorrelation times computed for the plaquette 
histories from the HMC, the MB-HB and the MB-OR algorithms respectively. 

The efforts for each single trajectory, the corresponding autocorrelation times, and the total efforts to 
obtain one statistically independent plaquette measurement are listed in Tab. 15.91 Note that — as 
has been pointed out in Sec. 15.2.11 — the effort for a single trajectory could have been reduced in the 
case of the MB-HB algorithm. The integrated autocorrelation times have been determined using the 
windowing procedure which has been discussed in Sec. 13.2.51 
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Efforts for meson 


masses 


Algorithm 


E illdcp (m 7r )/MV-mults 


E lndt 


p (m p )/MV-mults 


HMC 


< 810000 




< 810000 


MB-HB 


> 2000000 




> 2000000 


MB-OR 


264000 




352000 



Tabic 5.8: Numerical efforts for meson masses obtained by employing three different sampling algo- 
rithms. Courtesy B. Orth. 




50 100 



Figure 5.1: Autocorrelation function and the corresponding integral as a function of the cutoff for the 
plaqucttc history from the HMC run. This figure is identical to Fig. 14.121 



The time series from the HMC algorithm contains 320 autocorrelation times which is sufficient to obtain 
a reliable estimate for the autocorrelation time. The MB-OR algorithm was run for 140 autocorrelation 
times, which should be enough for a good estimate. The MB-HB algorithm, however, has only accu- 
mulated of the order of O(40) autocorrelation times if the value of r int is correct. This is too short for 
a safe determination of r int , therefore, these numbers have to be taken with a grain of salt. One cannot 
be sure that already the longest mode has been measured in the time series, but one can consider the 
autocorrelation mode giving rise to this value as a lower limit of the true autocorrelation time. 
As it has already been anticipated, the MB-HB algorithm can not compete with the MB-OR algorithm 
at this point in parameter space. The observed autocorrelation time for the plaquette in terms of 
trajectories is a factor of about 3.2 larger than for the MB-OR algorithm. However, the statistics 
which went into the MB-HB run is not yet sufficient. Given the large difference in the number of 
boson fields and the small number of gauge field updates between the noisy corrections during each 
trajectory compared to the MB-OR algorithm, the efficiency may consequently be even worse than 




1(1(1 200 300 400 500 600 



Figure 5.2: Autocorrelation function and integrated autocorrelation time for the plaquette histories of 
the MB-HB algorithm. 
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Figure 5.3: Autocorrelation function and integrated autocorrelation time for the plaquette histories of 
the MB-OR algorithm. 



Algorithm 


Effort /Trajectory 


T int (Plaquette) 


E indop (Plaquette) /MV-mults 


HMC 


16200 


11.0 ±0.4 


356400 ± 12960 


MB-HB 


2000 


141.8 ±32.7 


567200 ± 130800 


MB-OR 


4400 


44.7 ±3.4 


393360 ± 29920 



Table 5.9: Autocorrelation times and efforts for independent plaquette measurements for the three dif- 
ferent algorithms at (3 = 5.5 and n = 0.159. 



what is expressed in Tab. 15.91 The results for the meson masses (cf. Tab. 15. 8|) are compatible with the 
these findings. Again, the numbers should only be considered to be lower limits and may not capture 
the longest mode of the time series in question. 

When comparing the MB-OR and the HMC algorithms regarding the plaquette autocorrelation times, 
one finds that the algorithms are similarly efficient. In the case of the meson masses, the problem 
occurs that for the HMC only the configuration at every 25th trajectory has been analyzed. There is no 
residual autocorrelation in the sample, therefore the actual autocorrelation times may be even smaller 
than the numbers given in Tab. 15.81 

In conclusion, at the physical point given in Tab. l4.ll the MB-OR algorithm performs for the decorrela- 
tion of the hadronic masses at least as good as the HMC. For the measurement of hadronic masses, the 
results are similar. However, one finds that the tuning of MB algorithms is crucial for their performance. 

5.2.4 Scaling Behavior of Algorithms 

The ultimate goal of Lattice QCD simulations has been formulated in 10 (cf. Sec. I2.5.3"|) . namely the 
demand to simulate with three light fcrmionic flavors at quark masses of about l/4m s . For this goal 
to be reached, an algorithm is required which has a sufficiently weak critical scaling exponent when 
approaching the chiral regime (see Eq. (|3.22Jl ). The challenge is now to apply the algorithms from the 
previous comparison to a point in phase space with lighter fermion masses. The point has been chosen 
from the third line in Tab. 15. ll i.e. (3 = 5.5 and k = 0.160. It corresponds to lighter quark masses and 
should allow to shed some light on the scaling behavior of the algorithms under consideration. 
The updating sequence of the multiboson algorithm has been chosen identical to the previous run, see 
Tab. 15.61 The number of boson fields had to be increased, however, and is now n\ = 42. This results 
in an acceptance rate of 65.85%. 

As has been found in Eq. (|4.12|) . the total cost for a single trajectory should depend quadratically on 
the number of boson fields, n\. From the cost obtained in Tab. 15.91 for n\ = 24 we read off that an 
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estimate for the cost with ti\ = 42 is given by 

£ indop ~ x 393360 ± 29920 = 1204665 ± 91630 . 



(5.1) 



This estimate neglects the non-quadratic contribution of the correction step to the trajectory, but should 
still be a good approximation given the fact that the updating sweeps dominate the total cost. 
The number of trajectories performed in each case together with the average plaquette is listed in 
Tab. 15.101 The plaquettes coincide within their standard errors. 



Algorithm 


Trajectories 


Plaquette 


HMC 


5003 


0.56077(6) 


MB-OR 


9910 


0.56067(5) 



Table 5.10: Statistics and average plaquette for the HMC and the MB-OR algorithms at j3 = 5.5 and 
k = 0.160. 

The autocorrelation functions corresponding to the plaquette histories are displayed in Figs. l5.4l and l5.5l 

B 1 1 1 1 ' i 1 | = 
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Figure 5.4: Autocorrelation function and integrated autocorrelation time for the plaquette histories of 
the HMC algorithm. 
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Figure 5.5: Autocorrelation function and integrated autocorrelation time for the plaquette histories of 
the MB-OR algorithm. 



The corresponding efforts for a single trajectory, the integrated autocorrelation times for the plaquettes 
and the resulting efforts are given in Tab. 15.111 The statistics for the HMC algorithm are now more 
than 140 autocorrelation times, while the MB-OR has generated about 160 autocorrelation times. 
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Algorithm 


Effort/Trajectory 


T int (Plaquette) 


E indt . p (Plaquette) 


HMC 


42000 


34.1 ± 3.1 


2864400 ± 260400 


MB-OR 


8800 


61.1 ±4.1 


1075360 ±72160 



Table 5.11: Autocorrelation times and efforts for independent plaquette measurements for the two al- 
gorithms at (3 = 5.5 and k = 0.160. 



These numbers should allow for a reliable estimate of the efficiencies in both cases. In addition, the 
cost estimate from Eq. (15. If) is in excellent agreement with the measured effort E ind< , p in Tab. 15.111 
In light of these results, it is clear that the MB-OR gained a lot of ground in comparison to the HMC. 
For the former, the total effort to generate one statistically independent configuration only increased by 
a factor of about 2.7, while for the latter the effort has increased by a factor of 8.0. The MB algorithm 
has become an overall factor of almost three more effective than the HMC. For the simulation of two 
light, degenerate fermion flavors, the algorithm of choice is therefore definitely a multiboson algorithm. 

5.3 Summary 

It has been shown that all implementations of MB algorithms considered indeed produce the same 
physical results as the HMC algorithm. However, MB algorithms are more complicated to operate and 
tune and it has turned out that a suboptimal choice can easily lead to a degradation in performance. The 
efficiency of MB algorithms depends strongly on the polynomial and the updating sequence. The optimal 
setup for the polynomial at the chosen working points has been identified in Sec. 14.1.21 Furthermore, it 
has been discussed in Sec. l4.2.4l that one should apply sufficiently many gauge field updates between the 
correction steps to ensure a fast decorrelation. In this way, the field updates will dominate the runtime 
of the algorithm and lead to an optimal exploitation of resources. 

For intermediate quark masses the HMC is able to perform cquivalently to a well-tuned multiboson 
algorithm. When going to lighter quark masses, however, the MB will pretty soon outrival the HMC. It 
still remains to be seen, to what extend a non-Hcrmitian polynomial approximation is a viable candidate 
for further simulations in the deep chiral regime as they are planned in |238) . One may have to switch 
to a Hermitian approximation after one starts to encounter "exceptional" configurations with negative 
real eigenvalues to get reasonable acceptance rates. This step might be accompanied with an increase 
of ri\. However, first indications regarding the behavior of the smallest real eigenvalues in simulations 
in the deep chiral regime are given in |240j and references therein. These preliminary results hint that 
in actual simulations the sign problem may be absent unless one gets extremely close to the chiral limit. 
The optimal tuning of the MB algorithm can only be found after a certain runtime has already been 
invested since the best polynomial approximating the fermionic contribution to the action can only be 
gained from one or more thcrmalizcd gauge field configurations. This additional effort requires more 
logistics and should also be considered when estimating the efficiencies. 

Due to the price in complexity one has to pay, the HMC can consequently still be the preferred choice 
whenever it can be expected to be comparable or only slightly inferior to MB algorithms. Nonetheless, 
for simulations at very light quark masses close to the physical regime, it cannot be expected that the 
HMC is competitive anymore. An excellent candidate for future simulations at such masses is therefore 
the MB algorithm. 
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Up to this point, the emphasis has been put on the simulation of two degenerate dynamical fermion 
flavors. However, as has been argued in Sec. 12. 5. 31 realistic numerical simulations of Lattice QCD require 
a simulation with three dynamical fermionic degrees of freedom. Reference |10| shows that it is sufficient 
to concentrate first on the case of three mass-degenerate dynamical fermion flavors with dynamical quark 
masses of the order of l/4m s . One possible goal is to obtain the Gasser-Leutwyler coefficients from 
those runs. However, such an endeavor requires lattice sizes and Wilson-matrix condition numbers 
beyond what we arc capable of handling today. 

In this section, a first step in such type of program will be taken, namely the application of a multiboson 
algorithm with TSMB correction step to this physically interesting situation. 

In order to prepare the stage, we will work on fl = 8 4 and fl = 16 x 8 3 lattices. This will help to acquire 
some insight onto the chances of doing more realistic simulations on £1 = 32 x 16 3 lattices, as previously 
carried out for Nf = 2 in the SESAM-projcct 228, 133 . So the question is whether, in the Nf = 3 
scenario, we can establish an operational window to achieve a reasonably large pion correlation length 
without hitting the shielding transition that has been found in Nf = 2 at finite volumes and fixed /3, 
as k was increased towards /c orlt . 

Section 16.11 gives a short overview of the determination of the non-zero temperature crossover and the 
shielding transition. It is important to avoid this region in parameter space since the physical properties 
of the non-hadronized region are different from the zero-temperature phase of QCD. In particular, no 
hadrons are expected to exist and consequently one cannot extract useful information on their masses. 
The physically interesting point in parameter space in an infinite lattice volume f2 — » oo is the criti- 
cal point where the Wilson matrix describes massless fermions. This property has been discussed in 
Sec. 12.6.41 The practical ways to find this chiral limit are reviewed in Sec. 16.21 

The application to two different values of (3 is discussed in Sec. 16.31 These runs have been performed 
with the TSMB algorithm and might allow to identify a potential working point for future simulations. 
At this stage we would like to mention some previous algorithmic work on Nf = 3 physics, which was 
mainly carried out at finite temperatures. Reference |241j presents a detailed study of the thcrmody- 
namical properties of three flavor QCD. It employs the i?-algorithm for the numerical simulations. 
Note that, algorithmically, extensions of the HMC can also be used for these kinds of simulations 

HHsnmuHsiEia. 



6.1 The Non-Zero Temperature Crossover 

It is expected that the phase space of QCD contains a "deconfined phase", where chiral symmetry 
is restored and the quarks and gluons form a plasma with color-charges being Debye-screened. This 
transition takes place at some critical temperature. For general introductions to this topic consult 
|24l I32j . This phase is interesting for the description of hadronic matter at high temperatures and 
densities. However, when performing simulations relevant for the low-temperature phase of QCD - 
where the phenomenology is dominated by hadronized particles — this phase should be avoided. 
This transition is accompanied by a jump in the free energy of the system. An order parameter is given 
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by the Polyakov loop, which is defined to be |243| : 

P(S) = ~Tr JJtf ((<ro,aO) , (6-1) 



with x being a point in three-space, and L s and Lq the spatial and temporal lattice sizes, respectively. 
The physical picture of P(x) is the description of the average world line of a static quark. Information 
about the free energy of a static quark-antiquark pair can be obtained from the correlation of two such 
loops having opposite direction 

Y{S,y) = {L{x)L^g)), (6.2) 

One can show |32| that this quantity is related to the free energy F q q(x,y) of a static quark-antiquark 
pair via 

T(x,y) = eiq>[-/3F q g(x,ff)} . (6.3) 
Assuming that r(x, y) satisfies clustering, one finds 

T(x,y) = (L(x)L\y)) |<L)| 2 . (6.4) 

Hence, one obtains that if (L) = 0, the free energy increases for large \x — y\ with the separation of the 

quarks. This is a signal for the hadronization phase. 

Therefore the order parameter indicates the phase of the system via 

,p> _ f = hadronization , , . 

^ ' \ 7^ finite-temperature phase . ' 

This argumentation so far is only valid in the absence of dynamical quarks. It may, however, also be 
extended to the case of dynamical quarks with finite mass, see In this case, the Polyakov loop 

might similarly indicate the non-zero temperature crossover. 

Up to this point, the discussion has always considered the case where the temporal lattice extension 
is smaller than the spatial one, Lq < L s . In actual simulations, the situation can also arise that a 
transition similar to the non-zero temperature crossover occurs for too small lengths L s , even if Lq 
is sufficiently large. In this case, one is similarly unable to measure hadronic masses properly. This 
phenomenon is called the shielding transition. 



6.2 The Chiral Limit 

Of particular importance for any simulation of QCD is the critical line in parameter space, where the 
mass of the pion vanishes. The vicinity of this point allows for a treatment using x?T, as it has been 
argued in Sec. 12.5.31 As explained in Sec. 12.6.41 the Wilson matrix then contains a zero-mode. 
The critical line can be found by varying the hopping parameter k appearing in the action Eq. I|2.89[) 
at a fixed value of the gauge bare parameter (3. Then one has to find the critical value ft crit , where the 
fermionic contribution to the action describes massless fermions. Repeating this procedure for several 
values of (3 yields the critical line in parameter space. This procedure is impeded once the shielding 
transition sets in. 

A qualitative illustration of the shielding transition and the critical behavior is given in Fig. 16.11 The 
figure shows the squared pseudoscalar meson mass, (am T ) 2 , at a fixed value of /?, as a function of 1/k. 
The solid curve shows the mass in the infinite volume limit, f2 — > oo. The dotted line corresponds to a 
correlation length £ 2 = Xjiam^) 2 = 1. 

As has already been pointed out, finite-size-effects (FSE) will induce the shielding transition which 
might inhibit a reliable extraction of zero-temperature physics. We illustrate this scenario by sketching 
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the FSE for two different volumes, < (2) , with lengths L { s 1] < L { s 2) . When measuring the mass on 
the smaller lattice volume, f^ 1 ', one finds that the curve can be followed reliably up to the point K^uaid- 
Beyond this point, the shielding transition sets in and the mass can no longer be measured correctly on 
the smaller volume. The larger lattice volume, f2' 2 ', allows to go closer to the critical point, but will 

(2) 

still run into fmite-sizc-effects at some higher value, Kg Meld . The "true" value of K crit (as defined in the 
limit f2 — + oo) can be estimated the better the larger the available volume. 





1 (am^) 2 ' \ ' 
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Figure 6.1: Sketch of the FSE- induced shielding transition. The squared pseudoscalar meson mass, 
(amir) 2 , is plotted vs. the inverse hopping parameter 1/n. 



Lattice results become meaningful, once the pseudoscalar correlation length, stays larger than unity, 
£„• 3> 1. This condition is impossible to fulfill on the lattice f^ 1 ' — the shielding transition sets in 
before the desired parameter region is reached. On the lattice fl^ , however, it is in fact possible to go 
beyond £„■ > 1 before shielding is observed. Hence, for a given set of parameters one has to increase the 
lattice volume until one reaches a "window" , where the FSE are under control while the mass already 
became sufficiently small. 

But how to estimate K crit ? On a given large enough lattice, one can use the following recipes (see also 
SeaEU) 

1. the point in At-space where the condition number of the Hermitian Wilson matrix Q diverges, 

2. the point where the smallest real eigenvalue of the non-Hcrmitian Wilson matrix Q reaches the 
imaginary axis, 

3. the point where the pseudoscalar meson mass (am^) vanishes. This is the physical definition of 
the chiral limit. 

Comment: From a physical point of view, the last criterion is the approach of choice for estimating 
the critical point. The first two definitions will coincide and give identical results since, in both cases, 
the matrix contains a zero-mode. Furthermore, the mass of the pseudoscalar mesons is strongly domi- 
nated by the smallest eigenvalues and this dominance becomes more pronounced as the chiral limit is 
approached, cf. |139j . Hence, the results from all these methods will coincide sufficiently close to the 
chiral limit 1 . For larger masses, however, one can expect that the results from the methods differ in 
practical simulations. 

To properly apply the physical definition, one can use an extrapolation inspired by xPT. To be specific, 
one employs (see e.g. \22S\ ) 

) cx (aTOjr) 2 . (6-6) 

Kcrit / 

1 However, it is extremely difficult to actually work "sufficiently close" to the chiral limit 
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Strictly speaking, xPT only applies in the continuum. However, it is customary to nonetheless use 
such type of fitting function at a fixed value of /?, see again |228| and also |227j for latest results. 
Furthermore, this relation might have to be modified by logarithmic corrections which could cause the 
linear behavior predicted by Eq. (|6.6() to be inaccessible in current simulations ^UJ- F° r the moments 
of structure functions it has indeed been shown in 021 that a logarithmically modified extrapolation 
formula appears to yield best agreement with experimental data. Therefore, one should be careful when 
interpreting all predictions obtained by linear fits only. 

6.3 Explorative Studies 

In this section, results from simulations at two different values of (3 are presented, namely at /3 = 5.3 
(Sec.EQJ an d (3 = 5.2 fScc. I6.3^2")l . For several values of K Bea , the average plaquette is determined. In 
both cases, the critical value, K crit , is measured. In the latter case, both methods discussed in Sec. 16.21 
are applied, while in the former case only a single method is used. 
A discussion about prospects for future simulations concludes these investigations. 

6.3.1 The Case j3 = 5.3 

The simulations discussed here have been run at a value of (3 = 5.3 on lattices with volume O = S 4 and 
varying values of K aoa . The different values of K Boa , the number of trajectories after thermalization, and 



K Bca 


Number of confs. 


Plaquette 


Tint 


0.125 


3750 


0.4627(6) 


75 


0.135 


15710 


0.4717(7) 


281 


0.145 


15700 


0.4840(6) 


297 


0.150 


11600 


0.4956(8) 


265 


0.155 


9400 


0.5118(8) 


305 


0.160 


6100 


0.5498(18) 




0.161 


6200 


0.5533(3) 




0.162 


5600 


0.5564(5) 




0.163 


5500 


0.5595(5) 





Table 6.1: Hopping parameter, K Boa , number of trajectories, and plaquette values with resulting auto- 
correlation times for the runs with Nf = 3 and f3 = 5.3. 

the resulting average plaquette values are listed in Tab . 16 . II together with an estimate for the integrated 
autocorrelation time of the average plaquette. The standard errors on the plaquettes together with the 
estimate for r int have been determined using the Jackknife method. This data has been obtained from 
runs on both the Nicse and the ALiCE clusters, see Sec. 14.3.21 for further details. The algorithmic 
parameters have been varied in the runs. Table l6~2l shows the algorithmic parameters together with the 
resulting acceptance rates. 

The average plaquette is visualized in Fig. 16.21 Between K Boa = 0.150 and K Boa = 0.160 a large jump in 
the plaquette occurs which indicates the presence of the shielding transition. The values beyond this 
transition are therefore not particularly interesting and hence less statistics has been generated. An 
estimate for the autocorrelation time has not been obtained here. Therefore, the statistical error may 
be underestimated. 

For the determination of the critical value, re crit , the first method from Sec. 16.21 is adopted. Table fOl 
shows the average smallest and largest eigenvalues of Q 2 . The eigenvalues have been computed every 
100 trajectories, and the errors have again been estimated using the Jackknife method. 
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Tabic 6.2: Algorithmic parameters for the runs with three dynamical quark flavors at (3 = 5.3. 
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Figure 6.2: Average plaquettes for runs with three dynamical flavors at j3 = 5.3. 



Figure shows the resulting plot of 1/k vs. the inverse condition number A min /A max of Q 2 . 

The straight line is a fit to the points between 1/k = 6.452 and 1/k = 8.0 which is parameterized by 

A mm /A max = -0.4610(13) + 0.07252(17)/k . (6.7) 

From the point where A max /A min diverges (and thus n — > n„ it ) one finds 

K ctit = 0.1573(4) . (6.8) 

This method requires little effort and has a rather small error on the critical value of n. However, the 
estimate Ij6.8|l still contains a systematic uncertainty due to the fact that one is still rather far from the 
chiral regime. 

6.3.2 The Case [3 = 5.2 

The point considered in the previous section already showed signs of the shielding transition as the 
condition number of Q 2 still was below 100. Hence, this value of does not allow to probe the chiral 
regime further if one is limited to such small lattices. It can, however, be considered as a working 
point for future studies on larger lattices. As a different starting point, the focus will now be placed 
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0.00764(65) 


2.2367(18) 


292.9 ±25.1 
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0.00894(75) 


2.2545(6) 
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0.00633(51) 


2.2722(8) 


359.2 ±28.9 


0.163 


0.00885(66) 


2.2908(12) 


258.9 ± 19.3 



Table 6.3: Average extremal eigenvalues and condition numbers for runs with three dynamical flavors 
at (3 = 5.3. 
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Figure 6.3: Inverse condition number of Q 2 vs. 1/k for three dynamical fcrmions at (3 = 5.3. 



on the point (3 = 5.2 with lattice sizes of ft = 16 x 8 3 instead. This lattice size might already allow 
for a measurement of the ratio m w jm p for degenerate sea and valence quark masses and hence for an 
independent estimate of the chiral transition. Again, the finite-temperature phase of QCD has to be 
avoided. 

For the actual simulation, again several values for K sca have been chosen. The polynomial parameters 
arc given in Tab. 16.41 The runs have been performed on the ALiCE-cluster with a partition of eight 
nodes for each run. 



ni 


n 2 


n 3 


M] 


Updates / Configuration 


24 


300 


450 


[7.5 x 10- 4 ,3] 


1 boson HB, 5 boson OR, 




2 gauge Metropolis, 1 noisy corr. 



Volume: n = 16 x 8 



Table 6.4: Algorithmic parameters for each configuration for the runs at (3 = 5.2 with Nf = 3. 

In general, one can expect that the polynomial orders and intervals are chosen somewhat conservatively 
and one could achieve some gain by adapting them manually with respect to the spectrum of Q 2 
obtained during the production. Despite the lengths of the runs, it might still make sense to improve 
the statistics further. 

The working points chosen are listed in Tab. 16.51 together with the acceptance rate of the noisy correc- 
tion step, the number of performed trajectories, and the average plaquette with the error determined 
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from the Jackknife method. From the Jackknife estimate, the plaquette autocorrelation time has been 
determined. Finally the correction factor with its standard deviation is shown. 
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10900 
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Table 6.5: Simulation runs using three dynamical fermion flavors at [3 = 5.2. 



The plaquette for the run at K aoa = 0.162 showed a fluctuation between two different points and is 
plotted in Fig. 16.41 This is an indication that the shielding transition takes place around this point. 
Since the series is too short to make any statement about this fluctuation, the standard error is not 
shown here. 




20000 ' 3001)0 * 40000 50000 60000 

Trajectoi}' 

Figure 6.4: Plaquette history of the run at (3 = 5.2 and K Bm = 0.162 with Nf = 3. 

In the cases K aoa = 0.156 and K aoa = 0.158 the autocorrelation time appears to be very large. Hence, 
the statistics are still comparatively small at these working points. 

The magnitude of the reweighting factors in Tab. 16. 5| confirms the expectation that the polynomial has 
been chosen very conservatively in most cases. However, the run at K 3ca = 0.164 has a large fluctuation 
in the reweighting factor, which means that the smallest eigenvalue went off the polynomial interval. 
The precise situation is displayed in Fig. 16. 51 after the thermalization phase has been subtracted. If this 
run was to be continued, one may consider to use polynomials with a smaller value of the lower limit for 
the approximation interval. The properly reweighted values may still be used for this analysis, but the 
statistics may be worse for this case. For the other simulation runs, one can conclude that reweighting 
can safely be disregarded. 

Figure 16.61 shows the resulting values of the average plaquette as a function of the hopping parameter 
« sea . This plot corroborates that the shielding transition is located around K Boa = 0.162. 
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Figure 6.5: History of rcweighting factors for the run at (3 = 5.2 and K aoa = 0.164 with Nf = 3 
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Figure 6.6: Average plaquettes for runs with three dynamical flavors at (3 = 5.2. 
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Figure 6.7: Polyakov loops along the shortest length, L s 
fcrmion flavors at (3 — 5.2. 



for the simulation of three dynamical 
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Locating the Shielding Transition 

As a first guideline of where the crossover to the shielded phase takes place, the plaquette fluctuation in 
Fig. Eland the jump in the average plaquette in Fig. 16. 61 have been considered. To gain further insight 
one can investigate the behavior of the average Polyakov loop, see Sec. 16.11 However, in this case it 
should now be measured in spatial (i.e. x, y, and z) direction since the ^-direction is now the longest. 
The Polyakov line in that direction can be expected to show no sign of the finite-temperature phase. 
The Polyakov loops have been measured every 100 trajectories. The resulting values are shown in 
Fig. 16.71 Starting with K aoa = 0.163, one clearly sees a clustering in one of the three sectors. It is 
surprising that despite the rather long runs, in each case the values are clustered in only one sector. 
This indicates that the samples are not decorrelated with respect to this observable. At « sea = 0.162 the 
shielding transition is not yet apparent in the Polyakov loop. However, when considering the previous 
indications, it appears safer to disregard the latter run from the following analysis. 

Computing m^/mp 

The details for the measurement of hadronic masses have been given in Sec. 13.31 As has been discussed 
above, only the points K soa < 0.160 should be considered for this analysis. The reweighting factor has 
been included, although it had no practical influence in these productions. 

In the run with k = 0.156 the correlation functions for the (non-singlet) pseudoscalar and the vector 
mesons are visualized in Figs . 16 . 8l and 16 . 9l These functions have already been symmetrized, i.e. the plot 
shows (cf. Eq. IpOB) ) 

K^(t) = ±(r„, p (t) + r„, p (L -t)) , 

with Lq being the lattice extension in ^-direction. 

These functions should follow the behavior given in Eq. 1(3. 36j) . However, for small values of t, one 
expects the results to be too large (due to the contamination with higher modes, cf. Sec. 13.3(1 . while for 
larger values of t, larger autocorrelations of the greater lengths may result in worse statistics. 
To obtain an estimate for the autocorrelation time of these masses, the Jackknife method has again 
been employed. The case which is considered in detail is the run at n = 0.156. Figures f6 . 1 01 and 16.111 
show the variances of the masses for a fit interval from timeslice t = 5 to timeslice t = 7. 



Quantity 


Expect, value 


Variance er 2 


a 2 (B = l) 


T tni 


(am^) 


1.374(12) 


1.436 x 10" a 


1.163 x 10~ 5 


6.18 


K) 


1.440(13) 


1.803 x 10" 3 


1.804 x 10" 5 


5.00 



Table 6.6: Jackknife variances together with the corresponding variances for bin size B = 1 for the 
simulation run at /3 = 5.2 and k = 0.156. From there, an estimate for the integrated 
autocorrelation time is obtained. 

The resulting values are given in Tab. 16.61 together with the variance estimate for bin size B = 1. By 
exploiting Eq. (|3"ip . one can as usual obtain an estimate for the autocorrelation time of the quantity 
under consideration. 
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Figure 6.8: Symmetrized correlation function for the pseudoscalar meson with three dynamical fermions 
at P = 5.2 and k = 0.156. 




Figure 6.9: Symmetrized correlation function for the vector meson with three dynamical fermions at 
/3 = 5.2 and k = 0.156. 



Since the correlators have been computed every 100 trajectories, the results imply that the tt- and p- 
mesons have autocorrelation times of r£ t ~ 618 and rf Dt w 500 trajectories, respectively. These numbers 
are slightly better than what the plaquette has indicated, albeit still large. A source of this problem 
is the choice of the first polynomial. If the GMRES method had been used instead, one might have 
achieved a faster decorrelation by reducing the polynomial order, ni, see Sec. 14. 1.21 and also Sec. 15.2.31 
Figure IH. 121 shows the resulting values for masses in lattice units. The lower limit of the fit is given by 
the timeslicc t, while for the upper limit, always the next-to-last limit has been used, i.e. Lq* x = 7. The 
error is again taken to be the standard error, which has been computed using the Jackknifc procedure 
as above in Tab.^Hl The method follows the results discussed in |T5ni 12271 15| . 
The plateaus in Fig. 16.121 are reached at t = 5. Therefore, the values obtained at this point will be used 
in the following. Table summarizes all results together with the autocorrelation times determined 
using the Jackknife scheme. In the case K = 0.160, no plateau could have been identified and the results 
are compatible with an integrated autocorrelation time below 100 trajectories. 
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Figure 6.10: Jackknife variances for different bin sizes for the mass of the pseudoscalar meson with 
three dynamical fermions at (3 = 5.2 and k = 0.156. The mass is obtained from a fit to an 
interval [i min = 5, i max = 7]. 




Figure 6.11: Jackknife variances for different bin sizes for the mass of the vector meson with three 
dynamical fermions at /3 = 5.2 and k = 0.156. The mass is obtained from a fit to an 
interval [£ mi „ = 5, i max = 7]. 
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Figure 6.12: Meson masses in lattice units as a function of the fitting interval for three dynamical 
fermions at j3 = 5.2 . 
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Table 6.7: Masses and their autocorrelation times for the determination of the ratio m w jm p for three 
dynamical fcrmions at (3 = 5.2. 



Locating the Critical Point 

To locate the critical point, again the smallest and largest eigenvalues have been computed and the 
condition numbers have been determined for the runs at (3 = 5.2. The eigenvalues have been computed 
every 100 trajectories. Table lfi~8l summarizes the findings. 



K 


A mi „ 




A max /A mi „ 


0.156 


0.0330(6) 


2.2137(5) 


67.1 ± 1.1 


0.158 


0.0249(3) 


2.2503(9) 


90.4 ±3.2 


0.160 


0.0178(2) 


2.2879(5) 


128.5 ±3.3 



Table 6.8: Average extremal eigenvalues and condition numbers for runs with three dynamical flavors 
at ft = 5.2. 



The inverse condition number is plotted vs. the inverse quark mass in Fig. 16.131 

0,02 
0.015 
| 0.01 
X 0,005 




6 6,25 6.5 

Figure 6.13: Inverse condition number of Q 2 vs. 1/k for three dynamical fermions at (3 = 5.2. 



* 



The fit to all three data points yields 

A min /A max = -0.272(18) + 0.0448(28)/k. (6.9) 
The zero of the line gives 

K crit = 0.1645(29) . (6.10) 

Finally, the fitting function from Eq. H6.6(l is applied to the situation at hand with the pion masses 
given by Tab. 16.71 In Fig. 16.141 the inverse value of the quark mass, 1/k, is plotted versus the square 
of the pion mass, (am^) 2 . In addition, the rho mass, (am p ), is also included in this plot. The former 
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Figure 6.14: Square of the pion mass, (am,) 2 , (black circles) and the rho mass, (am p ), (black squares) 
for (3 = 5.2 with Nf = 3. 



is visualized as circles, while the latter is pictured by squares. The shielding transition is shown as a 
magenta bar. 

The linear fit to (am,) 2 is given by the solid green line in Fig. 16.141 The curve is parameterized by 

(am,) 2 = -(19.60 ± 1.94) + (3.359 ± 0.309)/k . (6.11) 

The critical value of K crit is then found to be 

K crit = 0.1713(67) . (6.12) 

The result from Eq. Ij6.12|l agrees within the errors with the previous result from Eq. H6.10JI . 
Furthermore, a quadratic curve has been drawn through the values for (am,) 2 , given by the green 
dashed line. It is parameterized by 

(am,) 2 = -340.81 + 105.06/k - 8.05/k 2 . (6.13) 
When using this curve, the resulting value for K crit is found to be 

K„ lt = 0.1659. (6.14) 
The linear fit to (am p ) is given by the blue line in Fig. 16.141 The curve is parameterized by 

(am p ) = -(5.91 ±0.60) + (1.147 ± 0.095)/k. (6.15) 

Obviously, it is not possible to reach values of > 1 before the shielding transition sets in on the 
current lattice size, cf. Sec. 16.21 Therefore, the linear extrapolation in Eq. Qfi.lljl may be biased with 
an uncontrolled systematic uncertainty. To estimate this effect, one may compare the resulting critical 
point, Eq. (|6.12|) . with the result obtained from the quadratic fit, Eq. (|6.14|) . This uncertainty makes 
further investigations closer to the chiral point necessary and consequently implies the need to go to 
larger lattices. 

Prospects for Future Simulations 

Up to this point, one could only achieve ratios of m T /m p > 0.9 with < 1. When going to larger 
lattices, the shielding transition will set in at larger values of K Hoa , allowing to probe lighter quark 
masses. 

A procedure for continuing along this line of research consists of going to f2 = 24 x 12 3 lattices, starting 
from K soa > 0.160 until the shielding transition for the new lattice sets in. In light of the fact that the 
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shielding transition in Fig. 16.141 is located shortly before one arrives at £ w > 1, a lattice size of L s = 12 
might already be sufficiently large to obtain a set of data points all fulfilling the requirement £„. > 1. In 
such a situation, one could obtain an extrapolation to the chiral point, re crit , with reduced systematic 
uncertainty. 



Bare parameters 



Physical parameters 





/3 spoc 




(am,) 


(amp) 




a/fm 


3 


5.2 


0.169(23) 


0.5 


0.882 


0.567 


0.226 



Table 6.9: The suggested working point for future spectroscopic studies on lattices with fl 
and beyond. 



32 x 16 3 



With the available information we can, however, still try to locate a working point at this particular 
value of (3 in the /3-K-plane with properties similar to the point chosen for the SESAM-project |133| . 
This working point will now be denoted (/3 spoc , K spoc ). With the uncertainties discussed above in mind, 
we imposes the following constraints: 

z = ^/L<l/4, 

> 2. (6.16) 

The actual parameters can be identified from the extrapolations Eqs. H6.11(l and H6.15(l . First, from 
setting ^ = 2, we obtain 

k !POC = .169(23). (6.17) 

From the requirement (|6.16|) that the value of the finite-size parameter should be z < 1/4, one finds, 
in accordance with the SESAM-data from |227| . that one has to go to lattices with at least L s = 16 if 
one wants to explore this region in parameter space. 

The parameters for this working point are summarized in Tab. 16.91 The estimated values for (am p ), 
m^/mp, and a have been computed from the fit (|6.15(l . The total physical lattice size L s would then 
be L s = 3.616 fm. A possible criticism against this working point might be that this lattice spacing 
is rather coarse. To actually increase the resolution, one would need to go to higher values of /3, thus 
moving closer to the continuum limit. 

Finally, the question arises how large the total effort might be for such a project. For the case of 
quadratically optimized polynomials, it has been argued in |218| . that the required increase in n\ when 
going from Nf = 2 to Nf — 3 is only of the order of about 30%. Reference |240j confirms this finding 
by stating that going from Nf = 1/2 to Nf = 3 will only increase n\ by about 50%. Taking — as a 
very conservative estimate — the latter number to be applicable also to the simulations performed for 
Nf = 2 in chapterOU we find by applying Eq. (|4.12(1 that the total cost for an independent configuration 
(with respect to the plaquette) is about 

£ indop ~ (1.5) 2 • (1075360 ± 72160) w (2420000 ± 162000) , (6.18) 

when considering the lightest quark mass, where m w /m p = 0.6695. Hence, this estimate marks the 
upper limit for the effort required in a simulation similar to the SESAM-project, provided one decides 
to take recourse to an MB algorithm. The total cost quoted in Eq. I|6.18|l is still smaller than the 
corresponding cost for the HMC run with Nf = 2. Therefore, one can expect the simulations at the 
lighter quark masses to be even cheaper than they were in the case of the SESAM-project. 

6.4 Summary and Outlook 

A first step towards the simulation of QCD with three degenerate dynamical quark flavors has been 
taken. The TSMB algorithm been applied successfully to this physically interesting situation. The 
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simulations have yielded first results for the shielding transitions on the current lattice size with L s = 8 
and the critical points at two values of (3. It has become clear that there is no window for doing 
spectroscopy at the parameters chosen. 

Prospects for future simulations have been given and a potential working point has been estimated, 
although with large systematic uncertainty. It has been argued that a study with physical masses 
similar to the SESAM-project is feasible today and might even cost slightly less than the HMC-based 
program. 

In an ongoing research project, such type of simulations will be performed on larger lattices and closer 
to the chiral limit. For the current status of the comprehensive project see |240| . 

A potential obstacle for future simulations with three dynamical fermion flavors may still be posed by the 
fcrmionic sign problem. As has been noted in Sec. 12.6.41 the fermionic determinant will change its sign 
if an odd number of real eigenvalues becomes negative. The polynomial approximations in Sec. 13.6.11 
however, are applied to the square of the Hcrmitian Wilson matrix. Hence, they will always yield a 
positive sign. Consequently, the sign would have to be included into the measurement of observablcs 
which may eventually spoil the statistical quality of the sample. A similar problem is known to occur 
in the simulation of gauge theories with a non-zero chemical potential, see |19()j . 

Such a problem does not show up for an even number of degenerate fermion flavors since in that case 
squaring the Wilson matrix will always yield a positive sign. The only known way to overcome this 
obstacle directly in a sampling process has been found for some quantum spin systems (cf. Sec. 12.6. 
It is yet unclear, if any quantum spin system similar to gauge theories with dynamical fermion flavors 
can be simulated efficiently in such a manner. However, as has already pointed out in Sec. 15.31 it may 
be that this sign problem is not significant in actual simulations of QCD. 
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A. Karpow — L. Portisch 
Milano 1975, Game 2 
Position after 36. . . ., ®e6-e5 

A foresighted strategy can help to find 
a winning move in a superior position. 

In this thesis algorithms for the simulation of quantum field theories with dynamical fcrmionic degrees 
of freedom have been presented. Special emphasis has been put on a new class of algorithms, namely the 
multiboson algorithms which represent the fcrmionic determinant by a number of boson fields. They 
allow for the implementation of local updating algorithms, which arc known to be superior to global 
schemes applicable to gauge theories so far. 

A particular variant of these algorithms, the TSMB method, has been implemented on several machines. 
This scheme relies on the computation of powers of matrices using static polynomials. The parameters 
which fix a given polynomial are the order and the interval of the approximation. Beyond that, it has 
turned out that the choice of the updating scheme is important. 

The optimal settings for these parameters have been determined and the sensitivity of the system to 
sub-optimal tuning has been analyzed. Furthermore, different updating schemes have been examined 
with respect to their efficiency and recommendations for the implementations of multiboson algorithms 
in general have been given. 

Due to the complexity of MB schemes, however, there is still room for improvement. MB algorithms 
remain open for refinements in the future, but can already be used for large-scale simulations today. 
Major emphasis has been put on how multiboson algorithms compare to their competitors in the field 
of dynamical fcrmion simulations. We have shown that, with sufficient tuning, MB algorithms appear 
to be superior to the HMC algorithm in the case of light quark masses. We would expect that further 
improvements in MB algorithms will be found with growing experience in future simulations. This 
might help the MB scheme to replace the HMC method as the standard algorithm in Lattice QCD. 
The final part of this thesis has considered the application of the TSMB algorithm to the case of three 
dynamical fcrmionic flavors, a situation which is of great importance for realistic simulations of QCD. 
Based on this experience, a proposal for future simulations has been formulated. In fact, one can be 
optimistic to perform a project similar to SESAM at reasonable cost. A working point for such type 
of simulations on SI = 32 x 16 3 lattices has been estimated, where semi-realistic simulations with good 
statistics should be run. This would provide an assessment of an operating window in the Nf = 3 
scenario. 

In conclusion, we find that multiboson algorithms provide a great leap forward in the simulation of 
Lattice QCD and give us the means to perform simulations in realistic scenarios. 
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A Notations and Conventions 



Unless otherwise explicitly stated, natural units have been adopted throughout this thesis by setting 



1 . 



(A.l) 



The four-dimensional Minkowski-space is denoted by M 4 and has the canonical flat-space metric 
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(A.2) 



The Wick rotation 



x^x' : x' = (x' , x') = (-ix , x) (l2~TI|) 

transforms vectors from Minkowski-space to the Euclidean space R 4 with the metric given by the 
Kronecker symbol 5^ v : 

Mi: tiTi. 

The discrete space of the lattice theory is denoted by Z 4 . Vectors in space are always denoted by 
x. Unit vectors are only used in Z 4 , where they are written as jl with /j, = 0, . . . ,3. If the lattice 
volume is finite, the lengths L M in direction [i are also denoted by L t = Lq and L s = L\ = Li = L3, 
where L t is the lattice size in "time" and L s in "space" direction. The total volume is denoted by 
O = Yl^ L^ = L t x Li and the corresponding space is Zq. For different lengths Li, L 2 , and L3, the 
notation ft = L x L\ x L 2 x L 3 will be used. For a bosonic field, 4>(x) G Zq, periodic boundary 
conditions arc imposed: 



<p(x + fiLfj) =4>{x) . 

Here x + £1 denotes the point adjacent to x in direction \i. 
The totally antisymmetric 4-tensor e^ pa obeys 

{—1, for [nvpa] being an odd permutation of 0123, 
+ 1, for \[ivp<j\ being an even permutation of 0123, 
0, otherwise. 

The commutator of two objects A, B for which multiplication and addition are defined is denoted by 



[A, B] = A ■ B — B ■ A. 
The anti-commutator is denoted by 



{A, B} = A ■ B + B ■ A . 



143 



A Notations and Conventions 



A.l Dirac Matrices 



The Dirac matrices 7^, fx = 0, ... ,3, in Minkowski space are defined by 
{1^,1 A = 2sv- 



The matrix 75 is defined by 



75 = 7 5 = i7o7i7273 = -^^paTl^YT 



It satisfies 7! = 1, and {75,77,} = 0. The chirality projectors Pr : l are given by 



(A.4) 
(A.5) 

(A.6) 
(A.7) 



When performing the Wick rotation, the Euclidean Dirac matrices are given by 

{l^lA = 25^. 
They are related via j2U 

71,2,3 = —171,2,3, 7o = —70 ■ 
The Euclidean 75-matrix is given by 75 = 71727370- 

The matrices with smallest dimension satisfying l|A.4j) are 4x4 matrices ^I] . The representation of the 
7^-matrices employed in this thesis has been chosen to be the "crural" one 1 , where the Minkowski-spacc 
matrices are given by 
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The Euclidean 7^ matrices are then given by 
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It should be remarked that this is different from the representation which has been employed in the standard TAD- 
libraries 12351 . where the Dirac form has been used 
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(A.9) 



-1, 

The contraction of a four-vector with the Dirac matrices is denoted by 



(A.10) 
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B Groups and Algebras 



One of the central concepts of particle physics is the notion of symmetry groups. All particles transform 
according to a symmetry of space-time and the symmetries of the Lagrangian. In the relativistic case 
this will mean that they transform as representations of the proper orthochroncous Poincare-group, see 
below. 

B.l Groups and Representations 

A group is a pair (G, •) of a set G and a relation •, satisfying the following axioms 

1. The operation ■ is associative, i.e. V x, y, z £ G holds (x ■ y) ■ z — x ■ (y ■ z). 

2. There is a unit element e £ G satisfying: VieG holds x ■ e = e ■ x = x. 

3. For every x £ G 3 x^ 1 £ G such that x ■ x^ 1 = x^ 1 ■ x = e. 

The group is called Abelian (or commutative) if additionally [x, y] = x ■ y — y ■ x, V x, y £ G holds. 
It follows immediately that the unit element is uniquely determined. Furthermore it follows that the 
inverse element a; -1 for each x is unique. 

A representation 31(G) of the group G is a group homomorphism from G to the group of vector space 
cndomorphisms of a representation space V, 31(G) : g i— ► M(g), g £ G and M(g) £ V with the following 
properties: 

1. M(g) ■ M(h) = M(g ■ h), i.e. the representation respects the group multiplication of G, 

2. M(l) = 1, i.e. the image of the unit element in G is the identity in V, 

3. M (g^ 1 ) = M~ l (g), i.e. the image of the inverse element is the inverse of the group element. 

A representation is called irreducible if it can not be written as the direct sum of other representations. 
Thus, there are no invariant subspaces under the action of the M(g) for all g £ G. In the following, a 
matrix in V (with an appropriate basis) with the above properties will be called a representation of G. 
Particles, as observed in nature, should certainly be independent of the way we choose our coordi- 
nate system, i.e. how we choose the basis for the representation space V (this requirement parallels 
the requirement of the theories to be coordinate invariant). Thus, they should always be classified 
by irreducible representations of a group. These irreducible representations also go under the name 
multiplet. 

Of particular interest to physics are the Lie groups, see for a textbook |244| . A Lie group is a group for 
which the multiplication law and taking the inverse are smooth functions. Thus, the group space must 
be a manifold and one can form the tangent space on any point in the group. The tangent space on 
the unit element is called the Lie algebra of the group. A basis of the Lie algebra is called the set of 
generators of the group. Accordingly, an element a of a Lie group a £ L can be written in terms of the 
generators {gi}, i = 1, . . . , N as: 




(B.l) 
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where the element a is parameterized using the {iOi} as coordinates. The dimension of a Lie group is 
thus the dimension of the underlying manifold of the group space. A Lie algebra can be specified by 
the structure constants f a bc, which are defined via 

[9a, 9b] = fabc9c ■ (B.2) 

For the integration over the group space, there exists a unique measure on G called the Haar measure, 
dU , which obeys: 

1. Consider a function / : G — ► C. Then dU obeys for all V E G 

f dUf(U) = f dUf(V -U)= f dUf(U ■ V) . 
Jg Jg Jg 

2. The integral is normalized, i.e. J G dU = 1. 
It satisfies 

f dUf(U)= f dufiu- 1 ). 
Jg Jg 

The rank of a group is the number of generators that simultaneously commute among themselves. It is 
thus the maximum number of generators which can simultaneously be diagonalized. 
A Lie algebra is called semisimple, if for some z e G, there are x,y G G with z = [x, y]. It can be shown, 
that for any compact Lie group, the algebra can always be written as the direct sum of a semisimple 
Lie algebra and an Abelian one. The semisimple Lie algebras can be decomposed into a set of groups 
which are called simple. The latter cannot be written as sums of anything else. The simple groups fall 
into the following categories 1 : 

1. The algebra sljy(C), the N x N complex matrices with vanishing trace. The compact real form 
of sLy(C) is su(iV) and the corresponding Lie group is SU(iV), the N x N unitary matrices with 
unit determinant. In the case N — 1, we speak of the group U(l), which consists of the complex 
numbers on the unit circle. 

2. The Lie algebra S02jv+i(C), the (2N + 1) x (2N + 1) skew-symmetric complex matrices with 
vanishing trace. The compact real form is sojv, and the Lie group generated is SO(iV), the 
N x N real, orthogonal matrices with determinant one. They form the rotation group in N- 
dimcnsional Euclidean space. The rotation group in Minkowski space whose metric changes sign 
on the diagonal is usually denoted with SO(3, 1), but still belongs to this category. 

3. The Lie algebra spat(C), the 2N x 2N complex matrices of the form 

( C D ) > 

where B and C are symmetric, and D is the negative transpose of A. The compact real form is 
sp(iV) and the Lie group is Sp(iV), which forms the group of N x N quaternionic matrices which 
preserve the inner product on the space H N of TV-tuples of quaternions. 

4. The Lie algebra S02at(C), the (2 AT) x (2A^) skew-symmetric complex matrices with vanishing trace. 
This is the even-dimensional analog of item [21 for even N . These groups are to be distinguished, 
since the physics in these cases may differ. 

1 A compact and readable introduction to the subject of simple, finite groups and this classification can also be found in 
http : //math .ucr . edu/home/baez/week63 .html, http://math.ucr . edu/home/baez/week64 .html, and 
http : //math . ucr . edu/home/baez/week66 . html 
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Apart from these classical algebras, there are also the groups G4, F 2 , E 6 , E?, and Eg. Some of these also 
have applications in physics, however, so far they are not considered to play any role for the purposes 
of this thesis. 

The dimensions and ranks of the three important kinds of semi-simple groups in this thesis are shown 
in Tab. EU 



Group 


Dimension 


Rank 


SO (TV), N even 


|iV(iV - 


1) 


N/2 


SO(N), N odd 


^N(N- 


1) 


(N-l)/2 


SU(iV) 


N 2 - 1 




N-l 



Tabic B.l: Most important semi-simple groups together with their dimension and rank. The table is 
taken from |245| . 



B.2 The U(l) Group 

The U(l) group is a special case of the SU(iV) groups. It consists of the group of complex numbers on 
the unit circle. It is a commutative group since the complex numbers commute under multiplications. 

B.3 The SU(iV) Groups 

The SU(iV) groups consist of elements isomorphic to the N x N unitary matrices with unit determinant: 
U- C/ f = U f ■ U = 1, detZ7=l. (B.3) 



Obviously the matrices U in (|B.3|) form already the fundamental representation of the SU(AT) group. 
In this thesis the groups SU(2) and SU(3) play a central role. The generators chosen for the specific 
realizations used in this thesis are listed in the following sections. 

B.3.1 The SU(2) Group 

The standard choice for the generators of the SU(2) group are the Pauli matrices: 

«-(: D- -(? 0). -0- <-) 

The peculiarity of SU(2) is that these matrices together with the unit matrix, 

l=( l ° 
v 1 

form a basis of the complex 2x2 matrices. The expansion coefficients form a hypersurface in the space 
of complex 2x2 matrices, where the expansion coefficients are real. A consequence of this observation 
is that any sum of SU(2) matrices is again proportional to an SU(2) matrix. This property only exists 
in the case N = 2. The proportionality factor can be computed by considering the inverse of a matrix 
A, 



3 

do + i ^~"^ er r a r = 

r=l 



a + ia 3 a 2 + ial 
— a 2 + iai a — ia 3 
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Then the inverse is given by 

! _ 1 , t _ 1 ( clq - ia 3 —a 2 - iai 



det A det A \ a 2 - iai ao + ia.3 



Consequently, the proportionality factor is given by k = vdetA, i.e. the matrix 



B = A/VdetA (B.5) 
is an SU(2) matrix. 

From Tab. IB. II it follows that the group has rank one, thus the representations correspond to the 
eigenvalues of a single operator. Usually, the eigenvalues of 03 are taken to classify the multiplets. 
SU(2) is locally isomorphic to SO(3) [T2|, which means that the algebras of the two groups are identical, 
although this does not hold for their global topology. To be specific, SU(2) is the double-cover of SO (3). 
While the latter is not simply connected, the former is. 

B.3.2 The SU(3) Group 

In this thesis the Gell-Man matrices {A^}, i = 1,...,8, as defined in [21] have been chosen as the 
generators of the SU(3) group: 



Ai = 1 , A 2 = 






(B.6) 



1/V3 
A 7 = [ -i I , A 8 = I 1/V3 

-2/V3, 



B.4 The Poincare Group 



The space-time manifold underlying the physical theories discussed in this thesis is given by the 
Minkowski-space. The metric is pseudo-Euclidean and can be transformed globally to the form 

/ 1 






V 




-1 







-1 




(fA"2l 



"I / 



An event is associated with a point in space-time, and the distance between two events is defined as 



(x - yf = {x- yY{x - vYg^ 



(B.7) 



Here and in the following the Einstein summation convention that identical indices are to be summed 
over is understood. The quantity a^a^g^ = a^a v is called the norm of a M . This norm, however, is not 
positive definite. Depending on the sign of a 2 , one defines the following classes of vectors: 

Timelike region: If a 2 = (x — y) 2 > 0, the distance is called timelike. In such a case, the two events at x M 
and may have a causal influence on each other and there exists a unique Lorentz transformation 
which reduces the spatial components of a M to zero. However, there is no transformation which 
rotates the a component to 0. 
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Spacelike region: If a 2 = (x — y) 2 < 0, the distance is spacelike. In this case, two events at i M and 
cannot be causally related. This requirement is equivalent to the colloquial saying that "no 
information can travel faster than the speed of light" . There is a unique Lorcntz transformation 
which rotates the a a component to 0, but there is none which reduces the spatial components of 
a M to zero. 

Likelight region: If a 2 = (x — y) 2 = 0, the distance between x M and y^ is lightlike. The two events can 
be causally related if the interaction happens by exchanging information using masslcss particles 
traveling at the velocity c. 

The Poincare group consists of the four-dimensional rotations in Minkowski space, the group SO (3, 1), 
and the translation group. An element of the Poincare group is denoted by (A M „, a/*) and transforms a 
four-vector x^ in the following manner: 

x'» = t^ v x v + . (B.8) 
The inverse transformation of (A, a) is given by (A -1 , A _1 a). The multiplication law is given by 

(Ai, ai) • (A 2 , aa ) = (AiA 2 , ai + Aio 2 ) . 
Thus, the set of Poincare-transformations form a non-Abelian group. 

According to the postulates of special relativity, the coordinate transformations are linear and real. 
When changing the frame of reference, the distance of two events will be unchanged, which implies that 
the norm of a vector is conserved. This means that for a M = (this subgroup is called the homogeneous 
Poincare-group) : 

A"„ = A*"„, 

A" a A a » = 6» v . (B.9) 
Consequently, one finds 
det A M „ = ±1 , 

and one can distinguish four kinds of transformations as displayed in Tab. IB. 21 From the four subsets, 
only the proper, orthochroneous set contains the unit element and is therefore the only subgroup. This 
subgroup is connected, while the entire homogeneous Poincare group is not connected. 



Group 




A 

iv 


Category 




+1 


> +1 


proper 








-1 


> +1 


orthochroneous 







+1 


< -1 




homogeneous 




-1 


< -1 









Table B.2: All four kinds of homogeneous Poincare-transformations compatible with IjB.Qjl . 



The Poincare group has six generators for rotations in the (i — i/-plane, L^„ (which are antisymmetric, 
= —L U)J ), and four generators P M for translations. Their commutation relations give rise to the 
Poincare algebra |245| : 

\Lftvi Lpa] = i {gupL^a — g^ p L ucr — g v <jL — [ip + gp a L V p] , 
\Lnv, Pp] = i {—QppPv + 9vpPy) j 

[Pp,P,A = 0. (B.10) 



151 



B Groups and Algebras 



The algebra admits a representation in Minkowski space in terms of differential operators: 

P M = ify. (B.ll) 
Defining the Pauli-Lubanski tensor by 

W = \e^" a P v L pa , (B.12) 

one finds |245| that the Poincare group has two Casimir operators: P M P^, and W^W^. This allows to 
classify all irreducible representations |22| : 

P^Pfj, = m 2 > 0, Pq > 0: The energy states lie on the hyperboloid in the forward light cone. This 
describes massive particles with spin s, \m, s), s = 0, 1/2, 1, 3/2, . . ., 

P^P p = 0, Pq > 0: The energy states lie on the forward cone. This describes massless particles with 
helicities h, \h), h= ±s, s counts as above, 

P M = 0: This is the single point at the origin. 

P^Pn = 0, Po < 0: The energy states lie on the surface of the backward light cone. The quantum 
number s is continuous. 

P M P p = m 2 > 0, Po < 0: The energy states lie on the hyperboloid in the backward light cone. 

P^P^ = — k 2 < 0(k G K): The particles lie on a spacelikc hyperboloid. This would describe tachyonic 
particles with velocities greater than c. 

Only the first two classes are realized for observable particles in nature. If there was no lower bound to 
the energy of a particle as it would be the case if the last class did correspond to any physical particle, 
an arbitrary amount of energy could spontaneously be created from any point in spacetime. According 
to the rules of quantum mechanics this would happen with finite probability. Thus, this possibility 
seems to be incompatible with the formulations of quantum field theories known so far. 

B.5 Spin-Statistics Theorem 

An important relation between the particles of different spins is the spin- statistics theorem |22l I18| . 
For a relativistic quantum field theory the observable particles (i.e. the physical states) must have the 
following properties if one requires that causality holds: Taking {x — y) 2 < to be a spacelike distance 
in Minkowski space, the fields $(x) must satisfy the following (anti-) commutativity relations: 

Bose fields: [$(x), = 0, if the fields <&(x) transform according to a particle with even spin. The 

particles described by $(x) are called bosons. Consequently, a single state may be occupied by 
an arbitrary amount of bosons. 

Fermi fields: {$(2:), & (y)} = 0, if the fields described by transform as a representation with odd 
spin. The corresponding particles are called fermions and a single state may only be occupied by 
a single or none fermion. 

B.6 Grassmann Algebras 

As has been noted in Sec. IB. 51 the fields describing fermions anticommute for spacelike distances. 
The anticommutativity is an essential property of Grassmann fields. Thus, Grassmann algebras are 
an important ingredient for the description of fermionic degrees of freedom. The discussion follows 
Ref. EH]. 
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B.6.1 Definitions 

Consider a map from p coordinates in C , {ik}, i = 1, . . . ,p, onto the complex numbers, 

S : C N ®- - -(g> C N r -> C : (m, ...,u p )^C. 

p 

S{u\, . . . ,u p ) is called p-linear if S is separately linear in each argument. It is called antisymmetric if, 
for any permutation 7r{l, . . . , p}, we have 

SKr(i)) ■ ■ -j^(p)) = sgn(?r) S(ux, ■ ••,%>)> 
where sgn(7r) denotes the signature of the permutation 7r. 

Now we consider the space ^(C^) of p-linear antisymmetric functions on C N . By definition, we set 
^"(C^) = C. For p > 1, one finds 

dimA p (C N ) = Q^) , 0<p<A^, 

A"^) = 0, p>N. 

The Grassmann product map assigns to any two vectors S € A p and T £ A q & vector S AT £ A p ® A q = 
A p+q via 

S A T(ui, . . . , u p + 9 ) = 

— ^2 S § n ( 7r ) ^K^l), ■ ■ ■ , W 7r(p ))T(u x ( p+1) , . . . , U^p+q))- (B.13) 

The Grassmann product is associative, 

RA (S AT) = (HAS) AT, 
and the commutation law becomes 

S AT = {-l) pq T AS . (B.14) 
The direct sum of vector spaces, 

N 

A(C N ) = 0A p (C w ), 

p=0 

together with the Grassmann product Eq. i|B.13|l form a graded algebra, called the Grassmann algebra 
over C N . An clement of ^(C^) can always be written as a sum S'o+5'i + . . .+Sn such that S p £ A P (C N ). 
The dimension of the algebra is given by 

dimA^) = 2 dimC " =2 N . 

A may be decomposed into an even and an odd part, 

A = A + © A_ , 
A + = A° © A 2 © . . . , (even subspacc), 

A_ = A 1 © A 3 © ... , (odd subspace). (B.15) 



Using the decomposition JB.15|) allows to write the product rule (|B.14|1 as follows: 

( T A S USeA+orTe A+, 
SAT -\ -TAS ifbothS,TG A_. (R16) 
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Thus, the even part A + is a commutative subalgebra. 

Let {&i}, i = 1, . . . , N, be a basis of C N . Any vector u E C N has then the coordinates {u 1 }. Then 
define special elements rf £ A 1 via 



The following properties then express the fact that the {rf} generate the Grassmann algebra: 

1. The {77*} anticommute: {rf^rj^} = 0. 

2. Each vector S £ A p may be represented as 



where s^...^ arc complex expansion coefficients which are antisymmetric with respect to permu- 
tations of their indices. 

The above definitions still make sense when the limit N — > 00 is considered. This is the interesting 
situation when applying Grassmann variables to continuum field theories. However, when constructing 
the Schwinger functions 6 at of Grassmann fields on the lattice, cf. Sec. 12.6.41 the behavior of the 
fcrmionic degrees of freedom in the continuum limit will also matter. 

B.6.2 Derivatives 

The derivative, d u , is a map 



if (u) = u % . 



S = -rSiL.^rf 1 ...rj 



d u : A p (C> ^ AP-^C"), 



P>0, 



which is given by 




(B.17) 



with respect to the basis {rf}. It obeys the following rules 




3. 9 {r] k S) = S k S _ 11 ka s _ 



B.6.3 Integration 




It is straightforward to proof the following rules: 



1. The relation between integration and differentiation is given by 
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2. Integration by parts is performed via (S € A p , T £ A) 

I [dri] (w s ) A T = { - 1){p+1) J [d??] s A h T • 



3. Consider a linear transformation a : C N i— > C w of the coordinates {it;} of 5. Then the following 
rule holds: 

drj\ S(au) = dct a / [drj\ S(u) . 



This integral is the counterpart of the corresponding integral in a real vector space, x S 



dxf(ax) = I deta| x y dxf(x). 
4. The exponential integral of the linear transformation a : C N i— ► C w is given by 

1 [d77][dC]exp|- |^ WC* j = (-1)^ det(-o). (B.18) 

This rule is again the counterpart of the exponential integral in a real vector space. However, in 
the latter case, the integral only exists for a positive definite transformation a, while the former 
exists for any a. 

In fact, the generating functional (|2.39(l for bosonic fields can be generalized to an integral over Grass- 
mann fields {rf} if fermions are considered. Then Eq. Q3.34|l is the central tool for evaluating the path 
integral on a finite lattice Zq. It should be pointed out that the sign-factor in Eq. (|B.18(1 drops out in 
the case of Dirac fermions since a Dirac spinor is composed of two Weyl spinors which are separately 
described by Grassmann variables. This in turn implies that N will always be even in case of Dirac 
fermions. Thus, the overall sign is +1. 
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C Local Forms of Actions Used 



For the local updating algorithms on the lattice discussed in Sec. 13.41 the lattice actions have to be 
cast into a form where the contribution of a single site factorizes from the contributions of the other 
sites. This is not possible for all actions, but in many cases it is possible to find an approximative 
action which fulfills the above condition and which has sufficient overlap with the original action under 
consideration. This idea is in fact the basis of Liischer's original proposal for a multiboson algorithm 
After the action has been rearranged in the form above, the local "staples" can be used for the 
local updating algorithms. 



C.l General Expressions 



Consider a lattice action of the following general form 

M 

§ = E E {wi^mff (x)) ■ ■ ■ (*))) , 



(C.l) 



i— 1 x 



i.e. on a given space f2 with coordinate vectors denoted by x € f2 we have a discrctized field {(f>(x)}. The 
action is given by a sum of M terms containing products of the field {(f>(x)} such that each coordinate 
appears in the action only once; i.e. the functions of the coordinates {fl(x)} (with k — 1 . . . M, and 
r = 1 . . . nfe, /£ : fi > 0) must be distinct: 



fl(x)^f k (x) Vi^j.xefi; i,j = l. 



(C.2) 



Furthermore the functions {f£{x)} must be invertible. 

Then we can choose the functions {fl(x)} such that fl(x) = x without loss of generality. If the action 
contains N different fields <fik(x), k = 1, . . . , N, each field-type 4>j( x ) has to be considered separately in 
Eq. (|C.1(I . The other fields (f>k^j(x) are then contained in the constants a,. 

From Eq. I)C.1|) we can compute the staples of the action, i.e. the change AS in the action S if we vary 
the field {(f>(x)} at a single point y about A<f>(y), with the following formula: 



A5[A#y)] 



<t>(f?(y))---<l>(fn + Y,t((fl(y)) 



p=2 



'4> (f! (/:•://:) : ) ■ ■ • <$> (f! (fl(y)y 1 ) ■■■4> (f! (/r(y))" 1 



omitted 



(C3) 



The above form may also be generalized to the case where 4>{x) denotes a field with several components, 
e.g. a complex 3x3 matrix in the case of gluon fields J7 M (x). The action (|C.1(I will then be the trace 
over the resulting matrix; however, equation l|C.3J) will have to be modified to account for the non- 
commutativity of the fields. Since the trace is not invariant under commutation, but under cyclic 
permutations, the expression reads 



AS [A^y)} 



&<t>(y) E ai 
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p=2 ^ ' 

x0 (/f (x-)- 1 ) (/? (/?(*)) : ) ■ • • (/? (/r 1 (*)) ~* 



(C.4) 

There is another important situation where the field {<f>(x)} at site x appears quadratically in the lattice 
action. In this case the action can be rewritten as a Gaussian and the heatbath algorithm discussed in 
Sec. 15X51 can immediately be applied. Such an action will have the following form: 

s[M*)] = E| a i(^) 2 ^(A 2 w)--^(/rw)) 

M \ 

+ J> {^(f?{x))--^{fr(x))) 



i=2 

M 



x I i=2 ) 

— (independent of <f>(x)) , (C5) 

where 5,i = ai (0 (/i ■■■(/) (fi 1 ( x ))) ■ The remaining terms independent of <fi(x) are of no importance 
for the updating algorithm and their precise form does not matter. The case where 4>(x) is a complex 
field or an n-componcnt field (where the trace has to be taken to compute the action) is straightforward. 
However, the matrix a\ must be invcrtiblc for this method to work. 

C.2 Local Forms of Various Actions 

To implement the local algorithms for gauge fields in Sec. 13.41 one has to find the plaquette staples 
Sfi(x) for a given action. The local action then takes the form 

I- 
N~ 

In the following subsections, this form will be examined for the cases needed in this thesis. Please note 
that in the following no implicit summation over the external index must be performed. 



AS[AU^(x)} = -^ReTr AU^x)S^(x) . (C.6) 



C.2.1 Pure Gauge Fields 

As a first example, consider the pure gauge action given by Eq. (|2.67|l . 

S[U(x)] = f 1 - ^feUrCWa)) , ®M 

with the plaquette U^ u {x) given by 

(a?) - Up (x) U v (x + p) Ul (x + 0) Ut (x) . (ESHJ 
Then one immediately finds for the local staple form of the action: 

AS[AU,(x)} = -f3±J2^^ AU ^( U »( x + vK( x + p ) U vW 

+U${x + /*- v)Ul(x - v)U v {x - &)j . (C.7) 
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C.2.2 Lattice Fermion Fields 

The Wilson matrix Q(y, x) describing a single, massive fermion flavor is given by the expression (12.81(1 : 



Q{y,x) = 5(y,x)-K^(u p (y- p)(l + ~f p )S(y,x + p) 

+ Ul{y){l- lp )8{y,x-pj), CMJ 



p=0 



where k < K orit . Up to now the boundary conditions have been chosen implicitly to be periodic in the 
lattice 1-, 2- and 3-directions and anti-periodic in the lattice O-direction. For the actual implementation 
of the local action, it is more convenient to impose periodic boundary conditions in all four lattice 
directions, and consequently have a symmetric treatment of the lattice volume f2. Respecting the anti- 
periodicity can be done by introducing an explicit factor which implements the anti-periodic boundary 
conditions in the lattice O-direction (also called T-direction) . We define the fermionic sign function to 
be: 



ll (x) = l-26(p,0)5(x o ,T a J) , (C.8) 

i.e. the function 9{x) is equal to —1 on the hyperslice with xq — T max for p, = only and +1 everywhere 
else. With this convention the Wilson matrix takes the form 

3 

Q(y,x) = 5(y,x) - K^2(u p {y - p)(l + 7 p )8{y,x + p)6 p (y - p) 

+ Ul(y)(l- 7p )S(y,x-p)9p(y)). (C.9) 

This staple can be used directly for the implementation on a computer. 



Wilson Fermions (Hermitian) 

Using the Hermitian fermion matrix the fermionic energy is given by 



5 < = E E $ (») z ) - pi) (Q( z ' a; ) - pi) ^ w > 633 



3 X V Z 



where the pj are the roots of the polynomial in 1(3.66(1 . If one uses even-odd preconditioning, the 
fermionic energy is given by Eq. ((3.69(1 : 

5 « ■ = E E $ (») {q(v> z ) - p oPi) {®( z > x ) - p opo) h (*) ■ em 

j xyz 
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Inserting the Wilson matrix (|2.81|) into Eq. Ij3.67|) one gets the action in the form of Eq. (|C.1J) : 

Si = X!E $(y) (Q 2 (y> x ) ~ (Pj + Po) Q(v, x ) + p*jPjS{y, x)j 4>j{x) 

j xy 

= E ^ (5 (^ z )- K H( [/ p(y-- 5 )( 1 + 7p)%^ + /5)^(y-/5) 

p 

+ Ul{y){l- lp )8{y,z-p)e p {y)) 
5(z, x) - ft^f Uo- (z - <r) (1 + 7 CT ) S(z, x + a)9 (7 {z — a) 

+ Ui{z){l- la )8{z,x-a)6 t7 {z)) 



3 xyz 



X75 



{P]+Pj)l5 



5(y,x) - k^2[ U p (y - p) (1 + j p ) 6(y,x + p)9 p (y - p) 



+ UUy)(l-y p )S(y,x-p)e p (y) 



E E C 1 + 16k2 + Pi Pi - W + p^) Mv) 

i y I 

+ K E (0>i + Pi^s(l + 1p) - 2) U p (y - p)^(y ~ p)6 p (y - p) 



w E 

Pl9 L P2 



+ iiPj + fth 5 (i - ip) - 2) ul{ y )Uy + p)o P (y) 
u P1 {y - h)u P2 {y -pi- P2)(i - 7 P i)(i + 7 P2 ) 

x 0j {y-pi- fcWpi (y - pi)^p 2 {y- pi- fa) 

+ U Pl (y - h)Ul 2 {y - pi){l - 7 P1 )(1 - 7 P2 ) 

x 4> 3 {y - fa + hWpi (y - piW P2 (y- pi) 

+ Ul (y)U P2 (y + fa - fa ) (1 + 7pi ) (1 + 7^) 

x 0j (y + fa- fc)Q P i (y)6 P 2 {y + pi-pi) 
+ ul(y) u P*(y + h)0- + 7 P1 )(i - 7 P J 

x 4> 3 {y + pi + P2)e pl {y)s P2 {y + fa) \. 



This expression can be cast into the form (|C.5(1 to yield 

5 '' = EE Re (4>]{y)A<p {y) + 4>){y)v 3 {y)) 



(CIO) 



j y 
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= E E U)(v)MM + \ (fMVM + v^ y )4> 3 {y)) 

j y ^ 

= EE i^y) + IVjMA- 1 ) A U 3 {y) + l -A- l V 3 {y)] + indep. of y 



j y 



With the chiral representation of the {7}-matrices, cf. Eq. (|A.9fl . the matrix A 1 takes a very simple 
form 



.4 



A- 1 



(l + 16K 2 + P * jPj ) - (p*+P 3 )j5 
fl + /275 , 

fl ~h 



fl f'2 fl $2 

A local boson field heatbath is then computed by (see also Eq. I|3.50|l ) 

ti(y)=n j (y)-A- 1 V j (y), 



(C.ll) 



with Qj (y) being a random number taken from a Gaussian distribution with unit width. A local boson 
field overrelaxation is performed by 



ti(y) = -My)-^- 1 v 3 (y) 



(C.12) 



In both cases, the order of the sites being updated matters. 

For the local gauge field updates, expression IC.lOfl has to be cast into the form (|C.4fl . Then AS e [AU p (y)] 
takes the form 



AS f [AU^(y)} = Re TV AU^O^y) ^|-2^](y + fi) (2 - (p* + Pi ) 75 (l + 7/1 )) ^(y) 

4>]{y + A)(i - 7m)(i + i P )^{y - p)u P (y - p)o P (y - P ) 
+ u p ( y + p)4>]{y + a + p)(i - 7p)(i + in)<t>j(y)0 P (y + A) 



-2k 2 E 

p¥=p 



+ $(y + A)(l - 7m) (1 - -y P )Mv + p)Ul(y)8 p (y) 

+ ul(y + A - p)4>]{v + A - A)(i + 7/0(1 + i^tMWv + A - A) 

(C.13) 

This expression can be implemented efficiently for the case of repeated local gauge field sweeps, as has 
already been noted in Sec. 14.3.41 Equation (|C.13|) admits a representation in the following form 



AS t [AU^y)] = RcTvAU^l Cl(y) + J2 

I p^p 



CUy)U p (y-p) + U p (y + fi)ClJy) 



+Ct P (y)Ul{y) + Ufa + A - p)Cl p {y) 
with the cache fields {C* C* p , C* p , C* p , C*,} given by 

cl{y) = -2 K e^y)e p (y-p)J24>](y + ^(2-(p* + Pj H(i + ^))^(y)> 



(C.14) 
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Cl P {y) 
Cl P {y) 



2n 2 e p (y)9 p (y - p) £ 4>](y + fi)(l - 7 „)(1 + lp )^(y - p) , 

2K%(y)6 p (y + A) E + A + A)(l - 7 P )(1 + 7p)&(|/) , 
j 

2K%(y)6 p (y) 4(V + A)(l - 7^(1 - 7,)&fo + A) , 
2K 2 6 p (y)6 p (y + fi, - p) ^ <^(y + A - p)(l + 7p )(l + 7^0/) 



(C.15) 



Any expression similar to i|C.13(l can be written in the form i|C.14(l . 

By inserting the matrix (|2.81(l into Eq. Ij3.69j) . one arrives at the corresponding expressions in the 
preconditioned case. P D designates the projector to odd, and P e the projector to even sites: 



S, 



3 y 



E E C 1 + 16k2 + p °PjPi - + ^(y) 

+ K ]T ((P e p* + P pi)7 5 (l + 7 P ) - 2) (7 P (y - p)^(y - p)6 p {y - p) 
p L 

+ ((P e p* + P oPj h 5 (l - 7p) - 2) Ul(y)^(y + p)0 p (y) 



- 2 E 

Pl¥=P2 



^Pi(y - Pi)U P2 (y - pi- p 2 )(l - 7pJ(l + 7 P2 ) 

x 0j {y- pi- fc) e Pi (y - pi)^p 2 (y-pi- h) 

+ U Pl (y - PiWl(y - /5i)(l - 7pJ(1 - 7 P2 ) 

x {y-pi + fcWpi (y - piW P 2 (y- pi) 

+ Ul (y)U P2 (y + p x - p 2 ) (1 + 7pi ) ( 1 + 7 P2 ) 

x 0j (2/ + pi - P2)0 Pl (y)0 P2 {y + pi - h) 
+ Ul 1 {y)U P2 { y + p 1 ){\ + lpi ){\- lp2 ) 

x 4>Av + pi + P2)e pl {y)s P2 {y + pi) \. 



(C.16) 



The corresponding staple AS S [AU p (y)] takes the form 



AS { [AU p (y)} = RcTTAU p (y)9 p (y)J2\ -2*$(v + A) (2 - (P e p* + PoPjW + 7m)) 4>M 
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+2k 2 J2 4>){y + A)(i - 7m)(i + -Y P )Mv - p) u p(v - p) d p(y - p) 

p¥=p 

+ U p (y + p)<t>){y + A + p)(l - 7 P )(1 + lp)MvWp(y + A) 

+ 4>]{y + A)(i " 7p)(1 - 7p)^(2/ + p)u;( y )e p ( y ) 

+ uUy + A - p)<rf(y + A - + 7 P )(i + ip)<t>j(y)Op(y + A - p) 



(C.17) 



Wilson Fermions (non-Hermitian) 



The fcrmionic action in terms of the non-Hermitian Wilson fermions is obtained by replacing Q{y, x) 
with Q(y,x) in IpTrlTjl . Without even-odd preconditioning one arrives then at 



5t = E E z ) - ft*) (^( z ' x ) - ^) ^ 



(l3~72|) 



J Z7/Z 

The local fermionic action becomes 



s < = E E $ fr) { ( x + 16k2 + ft> - (ft* + ft)) & (w) 



Pl7^P2 L 



^p(y - P) (1 + 7 P ) <M2/ - /5)f P (y - p) 

+C/ p t (jy)(l- 7 p)^(y + /5)0p(2/) 

U Pl (y- pi)U P2 (y-pi - /5 2 )(1 +7pJ(l +7 P2 ) 

x 0j {y- pi- fcWpi (y - pi)°P2 (y- fa- h) 
+ U P Ay - PiWKy - /5i)(l + 7pi )(i - 7 P2 ) 

x <f>j(y~pi + P2)6 pi (y - pi)6 P2 (y - pi) 
+ (y)U P2 (y + fa-p 2 ){l- 7pi ) (1 + 7p2 ) 

x 0j (y + pi- fc)Q P i (y)6 P 2 {y + fa - h) 

+ Ul(y)U P2 (y + fa){l - 7pi )(l - 7 P2 ) 

x <j>j(y + fa + fa)d Pl {yWp 2 (y + fa) >■ 



(C.18) 



The gauge action staples become 



ASflAU^y)} = ReTrAU p (y)9^y)Y,{ -2n$(y + fi) {l + Pj+Pj) (l+lp)4>M 
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+2k 2 4>]{v + A)(i + 7m) (i + 7 P )<f>j(y - p)u P (y - P¥ P {y - P) 

+ U p (y + {k)<j,]{y + p + p)(l + 7p)(1 + 7m)^(w)^(w + A) 

+ + a)(i + 7m)(i - 7 P )^(y + p)u}(y)o P (y) 

+ u\{y + A - A)4>j(z/ + A - A)(i - 7 P )(i + i^j(y)0 P (y + A - A) 



Finally the even-odd preconditioned form of Q3.72|l is given by 

s * = E E (Q f (»> z ) - *) - p opi) hi*) ■ 

j xyz 

This leads to the local fermionic action 

5 * = EE *\ (y) { i 1 + 16k2 + p °p>i - ( p ^i + p oPi)) h (y) 

j y I 

-K (1 + P e p* + PoPj) 



(C.19) 



(C.20) 



- 2 E 

Pl7^P2 



^(y - A) (i + 7 P ) <f>j(y - A)^p(y - A) 
+ul{y)(i- lp )4> j {y + P)0p{y) 

u P1 (y ~ Pi)U P2 (y - pi - p 2 )(l + 7pJ(l + 7p 2 ) 

x - Ai - fc)o P1 {y - PiW P2 {y -pi- h) 
+ u P1 (y - pi)ul(y - pi)(i + 7 P1 )(i - 7 P2 ) 
x 4>j{y-pi + p2)d P1 (y - pi)6 P2 (y - pi) 

+ Ul(y)U P2 (y + pi-p 2 )(l-j Pl )(l + y P2 ) 

x (f>j(y + pi - p2)0 Pl (y)6 P2 (y + pi - p 2 ) 
+ Ul 1 {y)U P2 {y + pi)(l- lpi ){\~ lp2 ) 

x4>j{y + Pi + p2)0 P1 (y)0p 2 (y + pi) j- 

(C.21) 



The fermionic contribution to the gauge field staple for non-Hcrmitian even-odd preconditioned Wilson 
fcrmions is then given by 



AS { [AU p (y)} = RcTrAU p (y)0 p (y)J2\ -2*$(v + A) (l + PeP* + PoPj) (1 + iMy) 
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C.2 Local Forms of Various Actions 



4>]{y + A)(i + 7m)(i + i P )4>j{y - p)u P {y - p)6 p (y - p) 

+ U p (y + p)<t>){y + A + p)(l + 7 P )(1 + iJMvWpiV + A) 



+ + A)(l + 7m)(1 - 7p)^(y + P)Ul{y)e p {y) 

+ + A - p)<^(y + A - - 7 P )(i + i^j(y)0 P (y + A - p) 



(C.22) 



This concludes the discussion of local forms of the actions used. 
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It has become clear in the discussion of the TSMB algorithm in Chapter^] that the effort for maintaining 
and running a production run to generate a sufficiently large sample of field configurations is enormous. 
A large amount of data is being generated. But already in the simpler case of the HMC, a large number 
of gauge field configuration is generated which will have to be stored in a large file-server. In particular, 
for the SESAM/TxL-projects |161l 1133] . several TBytes of data have been accumulated. 
In the case of a large-scale multiboson production, one may in addition want to change the polynomials 
during the production run and thus end up with a selection of sub-samples all distributed with a different 
multicanonical action. Therefore it is inevitable to have a powerful machinery available which allows 
to maintain and use a TByte-sized archive over several years and conserve the data for potential later 
use by other groups 1 . 

To meet these goals, an SQL-based database system has been devised. The design has been a part of 
the TSMB development in this thesis and it has turned out to be very useful for practical applications. 
In particular, the following components have been developed: 

1. A library to read and write gauge field configurations on variable lattice sizes in the standardized 
Gauge Connection format 2 . The library is usable both from C and Fortran and allows to access 
the gauge fields in the form of a comfortable data structure. Furthermore, a selection of different 
and proprietary formats is supported which is used mainly for data exchange with the APE- 
machincs. The majority of gauge field configurations generated on these machines is still available 
in this format only 3 . 

2. A database programmed in SQL which employs the fast and efficient MySQL-database engine 4 . 
Albeit its lack of certain features of modern databases, it is very suitable for the purpose of 
storing information from numerical simulations. The reason is that write accesses (which usually 
consist of adding a new configuration) only take place once every minutes or even hours during a 
production run and almost never concurrently. The same is valid for queries: queries are used to 
request information for measurements and are unlikely to happen concurrently. Thus, usage of the 
MySQL engine appears to be perfectly justified for the purposes of lattice field theory simulations. 

3. Programs to support adding configurations to the database and to support specific types of queries. 
The database can be accessed using a high-level language via their corresponding interfaces. This 
allows a direct combination with the conversion library discussed above. A further alternative is 
the access to the database using script languages like shell scripts or Perl scripts. 



1 Alrcady the SESAM/T^L groups have realized that an efficient, standardized system for the storage and handling 
of their configurations was in demand. The contributions discussed in the following originally were developed as a 
solution to their problems 

2 See for a definition and description http://qcd.nersc.gov 

3 The program can be downloaded from 

http : //www. theorie .physik. uni-wuppertal . de/~ wolf ram/publications/downloads/unic . tar . gz 

4 The database can be found at http://www.mysql.com/ 
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D.l Design of the Database 

A number of text books is available which describe the design process of a database in detail, see 
e.g. [246 . The basic structure of a database is characterized by a set of entities, their corresponding 
properties, and relations between the entities. Important design goals are 

• avoidance of UPDATE- anomalies, 

• elimination of redundancies, 

• the creation of an understandable model, 

• and the minimization of restructuring the relations for the introduction of new data types. This 
should prolong the life expectancy of the applications. 

The above points can be satisfied, if the underlying database is normalized. There exist a number of 
properties the relations need to satisfy for the database to be normalized. The most important are ones 
are given by the first five normal forms. 

Figure ITXT1 shows the entities together with the relations between them. These ingredients will now be 
discussed in detail. 




Figure D.l: Entity-relation diagram for the configuration database. 



The entities in the database are given by 

Configurations: Any single gauge field configuration needs to be stored separately. Several pieces of 
information are required for the configurations to be reproduced correctly. The Gauge Connection 
format stores all necessary information as a part of the file in the header. The Configurations 
entity therefore needs to have similar properties. Table IdTTI lists all attributes of this entity. 

Polynomials: The TSMB algorithm (see Sec. l3.5.3|) requires a multicanonical reweighting with a correc- 
tion factor depending on the choice of the polynomial used (see Sec. 14. 1..^ . Hence, it is important 
to know the polynomial the configuration has been sampled with. Therefore, the Polynomials 
entity will contain all necessary information about up to three polynomials used. However, if 
reweighting is not required — if either the configuration has been sampled using an algorithm 
like the HMC or with a multiboson algorithm using an exact correction step — no polynomial 
will be associated with the configurations. The relation Rl between the Polynomials and the 
Configurations entity is thus c : m. The attributes implemented for Polynomials are displayed in 
Tab.lD~2l 
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Archive_date 


DATETIME 


Archive date and time 


EnsemblelD 


ENSID 
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PolynomiallD 


POLID 


Foreign key, references Polynomials 


Location 


VARCHAR(255) 


Complete path to the Gauge Connection file 


Comment 


TEXT 


(Optional) comment 



Table D.l: Attributes of the Configurations entity. 



Ensembles: For the Monte-Carlo integration schemes as discussed in Sec. 13.11 one has to compute a 
sample of gauge field configurations which can then be used to measure a physical quantity with 
a certain statistical error. For this procedure it is important to categorize all configuration in 
the database into distinct classes according their physical parameters, the people who contributed 
to them etc. This classification is implemented using the Ensembles entity. It is important to 
realize that this entity need not classify the configurations only by their physical properties, but 
can also categorize the configurations by certain "organizational" considerations, i.e. the origin 
of the configurations, the projects they are intended for etc. The relation R3 between Ensembles 
and Configurations is 1 : to, i.e. each configuration must be part of one and only one ensemble, 
but each ensemble can contain several configurations. The corresponding attributes are shown in 
Tab. El 

Machines: It is useful to know on which particular machine a certain ensemble has been sampled. 



Attribute 


Type (SQL) 


Content 


POLID 


INTEGER 


Polynomial-identification, primary key 


a 


DOUBLE PRECISION 


Power of the polynomial (cf. Eq. I|3.73}l) 


e 


DOUBLE PRECISION 


Lower end of polynomial approximation interval 


A 


DOUBLE PRECISION 


Upper end of polynomial approximation interval 


ni 


INTEGER 


Order of first polynomial 


n 2 


INTEGER 


Order of second polynomial 


n3 


INTEGER 


Order of third polynomial 


Location 


VARCHAR(255) 


Complete path to polynomial input file 


Comment 


TEXT 


(Optional) comment 



Table D.2: Attributes of the Polynomials entity. 
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Attribute 


Type (SQL) 


Content 


ENSID 


INTEGER 


Ensemble identification, primary key 


Ensemble 


TEXT 


Description of ensemble (physical & organizational) 


HWID 


MACHID 


Foreign key, references Machines 


Comment 


TEXT 


(Optional) comment 



Table D.3: Attributes of the Ensembles entity. 



This is one example of the categorization of the Ensembles entity and the only example which 
has been implemented in this thesis. The practical use of this information is the evaluation of 
efficiency analysis, where one usually performs simulations at equivalent physical parameters, but 
on different implementation systems (cf. Sec. I4.3JI . This is again an 1 : m relation (see relation 
R2), since each ensemble has to be created on a particular implementation system, but each 
implementation can give rise to several ensembles. Attributes relating to the Machines entity are 
given in Tab. ID. 41 



Attribute 


Type (SQL) 


Content 


MACHID 
Hardware 


INTEGER 
VARCHAR(255) 


Machine identification, primary key 
Description of hardware 



Table D.4: Attributes of the Machines entity 



Beyond what has been done here, it is possible to introduce further entities to categorize the Ensembles 
further, like different research groups or different projects where the configurations are to be used. This 
topic has so far been outside the scope of this thesis and has therefore not been implemented. 
In the practical implementation, the gauge field configurations cannot be stored in the database itself. 
In fact, the storage requirements are enormous — a typical configuration on an fi = 32 x 16 3 lattice will 
use about 24 MB of RAM, and a typical sample consists of several thousands of these. It is clear that 
a dedicated storage device is required. The solution was to store the configurations on a tape archive 
installed at the Forschungszentrum Jiilich, Germany. The database contains only the path to the 
configurations in the archive. If the configurations are in Gauge Connection format, they will contain 
redundant information about their physical and logical affiliation. This redundancy ensures that the 
archive can also be used independently from the database. For the same reason, the information about 
the lattice volume are stored in the Configurations table and not in the Ensembles table, in contrast to 
what one would expect from a normalized relation. 

With the extended definition of the Configurations entity which also includes the Format and Ordering 
properties in Tab. ID. II one is also able to store configurations in formats different from the Gauge 
Connection scheme. In particular, all other structures used by the SESAM/T^L-collaboration are 
supported by the current design. In this case, the information in the table is not redundant and is 
required to successfully access a particular configuration. Furthermore the Link_Trace and Plaquette 
properties are simple and efficient checksum implementations for these applications. 
There is an important subtlety regarding the approximation interval for quadratically optimized polyno- 
mials discussed in Sec. 13 .6. 11 as used in Tab. ID.2l the interval applies to the first and second polynomials 
and it is assumed that these intervals are identical. If this is not the case, one will have to store two 
sets of [e, A] values for the two polynomials. The corresponding information about the third polynomial 
is not required since it is not used for reweighting purposes. The information about n% can therefore 
also be considered optional. 
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In all cases, the Location entry should contain sufficient information to uniquely locate a file in the 
archive. Therefore, the format userQhost : /complete-path-to-file has been used, which allows the 
file to be accessed directly using the scp program . 

In conclusion, the configuration database allows to store all necessary information about gauge field 
configurations. It supports different formats and allows to salvage all data from the SESAM/TxL 
projects. It uses a modern database design which can be accessed from a diversity of different imple- 
mentation systems. The configurations in Gauge Connection format can also be accessed independently 
from the database. 



The program and documentation can be obtained from http://www.openssh.com/ 
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